dmarx / Multi-Modal-Comparators

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP
39 stars 4 forks source link

model architectures and pretrained models to support #2

Open dmarx opened 2 years ago

dmarx commented 2 years ago

installable

installable with extra effort

Not installable

Not released


References for more variants:

https://paperswithcode.com/paper/learning-transferable-visual-models-from

Potentially in scope, lower priority

Older stuff

VQA is sort of a generalization of vision language co-training... TBD.

MAGMA could be another useful approach to promote multi-lingual support

https://github.com/Aleph-Alpha/magma

dmarx commented 2 years ago

https://github.com/navervision/KELIP

apolinario commented 2 years ago

LiT: they released some models last week https://github.com/google-research/vision_transformer#lit-models

Audio: besides AudioCLIP there's also wav2clip with a different approach: https://github.com/descriptinc/lyrebird-wav2clip image

dmarx commented 2 years ago

https://github.com/allenai/reclip

dmarx commented 2 years ago

https://github.com/facebookresearch/Detic

dmarx commented 2 years ago

https://github.com/sallymmx/ActionCLIP

dmarx commented 2 years ago

https://github.com/ChenRocks/UNITER

dmarx commented 2 years ago

https://github.com/raoyongming/DenseCLIP

dmarx commented 2 years ago

https://github.com/ttlmh/Bridge-Prompt

apolinario commented 2 years ago

https://github.com/sonoisa/clip-japanese

dmarx commented 2 years ago
dmarx commented 2 years ago

SLIP demo install using dapm and loading weights using old strategy from DD: https://github.com/alembics/disco-diffusion/commit/c509aa1b9c00b2323a1fd95c5b0fc667bb12be4c

!wget https://dl.fbaipublicfiles.com/slip/slip_base_100ep.pt
!pip install napm

import napm
url = 'https://github.com/facebookresearch/SLIP'
napm.pseudoinstall_git_repo(url, add_install_dir_to_path=True)

import torch
import napm
import SLIP
from SLIP.models import SLIP_VITB16, SLIP, SLIP_VITL16

sd = torch.load('slip_base_100ep.pt', map_location=torch.device('cpu') )
real_sd = {}
for k, v in sd['state_dict'].items():
  new_key = '.'.join(k.split('.')[1:]) # strips "module" prefix. sure, why not.
  #print(k, new_key) 
  real_sd[new_key] = v
del sd

SLIPB16model = SLIP_VITB16(ssl_mlp_dim=4096, ssl_emb_dim=256)
SLIPB16model.load_state_dict(real_sd)
dmarx commented 2 years ago

CLOOB demo using dapm

!pip install git+https://github.com/openai/CLIP
import napm

url = "https://github.com/crowsonkb/cloob-training"
napm.pseudoinstall_git_repo(url, package_name='cloob')

import cloob
from cloob.cloob_training import model_pt, pretrained

config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint), )
#model.eval().requires_grad_(False).to('cuda')
dmarx commented 2 years ago

CoCa https://arxiv.org/abs/2205.01917

dmarx commented 2 years ago

OTTER https://github.com/facebookresearch/OTTER

dmarx commented 2 years ago

ruDOLPH

dmarx commented 2 years ago

https://socraticmodels.github.io/

dmarx commented 2 years ago

https://github.com/yxuansu/MAGIC

apolinario commented 2 years ago

https://github.com/rinnakk/japanese-clip (not the same as this)

apolinario commented 2 years ago

This is a very large and seemingly very good CLIP in Chinese that @Dango233 has shown me: https://wukong-dataset.github.io/wukong-dataset/benchmark.html

One problem though: It's pre-trained weights are on Mindspore (Huawei's PyTorch) so someone would need to convert that...

Dango233 commented 2 years ago

https://github.com/mindspore-ai/models/tree/master/research/mm/wukong

dmarx commented 2 years ago

maybe just the fine tuned model?

https://github.com/j-min/clip-caption-reward

apolinario commented 2 years ago

A new (better, it seems) Multilingual CLIP https://github.com/FreddeFrallan/Multilingual-CLIP

rom1504 commented 2 years ago

@apolinario indeed and now it's packaged properly on pypi as multilingual-clip

it's also available for easy testing at https://rom1504.github.io/clip-retrieval/?useMclip=true&query=%E9%BB%84%E8%89%B2%E3%81%84%E7%8C%AB&back=https%3A%2F%2Fknn5.laion.ai&index=laion5B

dmarx commented 2 years ago

@rom1504 @apolinario the m-clip release gave me a thought: maybe we could host mmc on pypi with essentially none of the other perceptors installed at all. Simple instructions for "finalizing" the mmc install could live in the README (as well as one-liners for specific perceptors PRN), and we could add a warning on import too. maybe we could ship an update script or a CLI command.

My thinking here is if we ship the core tooling as a bare library, then anyone could attach the mocking utilities upstream to quickly make new perceptors drop-in-able if they aren't already, which conversely would make them trivial to add to mmc (since they'd already be hooked into a conformant API one way or another).

Actually, it might be cleaner and simpler to isolate a simple mocking wrapper and package that for pypi?

I'm mostly just thinking out-loud now. Thoughts?

apolinario commented 2 years ago

I like the idea and spirit and I feel eventually if MMC gets way too many perceptors making some optional make a lot of sense. Now starting with all optional, I'm not sure - regardless I think your idea holds - just not sure if we ship empty or with some basics (OpenAI + OpenCLIP for e.g.) and let users further install from then on

apolinario commented 2 years ago

(New perceptor: https://github.com/microsoft/UniCL)

apolinario commented 2 years ago

https://github.com/goel-shashank/CyCLIP

dmarx commented 2 years ago

https://github.com/facebookresearch/omnivore

dmarx commented 2 years ago

https://github.com/microsoft/GLIP

dmarx commented 2 years ago

https://github.com/microsoft/RegionCLIP

rom1504 commented 2 years ago

https://github.com/FacePerceiver/FaRL#use-farl-as-faceclip face clip

On Sat, Jun 25, 2022, 20:18 David Marx @.***> wrote:

https://github.com/microsoft/RegionCLIP

— Reply to this email directly, view it on GitHub https://github.com/dmarx/Multi-Modal-Comparators/issues/2#issuecomment-1166338042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SIUULWF4SL66ON72LVQ5EOVANCNFSM5S6LPHLA . You are receiving this because you were mentioned.Message ID: @.***>

dmarx commented 2 years ago

https://github.com/Lednik7/CLIP-ONNX/tree/main/clip_onnx

dmarx commented 2 years ago

https://github.com/OFA-Sys/OFA

apolinario commented 2 years ago

Turkish CLIP https://github.com/yusufani/TrCLIP

dmarx commented 1 year ago

https://github.com/salesforce/LAVIS

dmarx commented 1 year ago

EVA-CLIP - https://github.com/baaivision/EVA/blob/master/clip/README.md

basically already api compliant