Open dmarx opened 2 years ago
LiT: they released some models last week https://github.com/google-research/vision_transformer#lit-models
Audio: besides AudioCLIP there's also wav2clip with a different approach: https://github.com/descriptinc/lyrebird-wav2clip
SLIP demo install using dapm and loading weights using old strategy from DD: https://github.com/alembics/disco-diffusion/commit/c509aa1b9c00b2323a1fd95c5b0fc667bb12be4c
!wget https://dl.fbaipublicfiles.com/slip/slip_base_100ep.pt
!pip install napm
import napm
url = 'https://github.com/facebookresearch/SLIP'
napm.pseudoinstall_git_repo(url, add_install_dir_to_path=True)
import torch
import napm
import SLIP
from SLIP.models import SLIP_VITB16, SLIP, SLIP_VITL16
sd = torch.load('slip_base_100ep.pt', map_location=torch.device('cpu') )
real_sd = {}
for k, v in sd['state_dict'].items():
new_key = '.'.join(k.split('.')[1:]) # strips "module" prefix. sure, why not.
#print(k, new_key)
real_sd[new_key] = v
del sd
SLIPB16model = SLIP_VITB16(ssl_mlp_dim=4096, ssl_emb_dim=256)
SLIPB16model.load_state_dict(real_sd)
CLOOB demo using dapm
!pip install git+https://github.com/openai/CLIP
import napm
url = "https://github.com/crowsonkb/cloob-training"
napm.pseudoinstall_git_repo(url, package_name='cloob')
import cloob
from cloob.cloob_training import model_pt, pretrained
config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint), )
#model.eval().requires_grad_(False).to('cuda')
https://github.com/rinnakk/japanese-clip (not the same as this)
This is a very large and seemingly very good CLIP in Chinese that @Dango233 has shown me: https://wukong-dataset.github.io/wukong-dataset/benchmark.html
One problem though: It's pre-trained weights are on Mindspore (Huawei's PyTorch) so someone would need to convert that...
maybe just the fine tuned model?
A new (better, it seems) Multilingual CLIP https://github.com/FreddeFrallan/Multilingual-CLIP
@apolinario indeed and now it's packaged properly on pypi as multilingual-clip
it's also available for easy testing at https://rom1504.github.io/clip-retrieval/?useMclip=true&query=%E9%BB%84%E8%89%B2%E3%81%84%E7%8C%AB&back=https%3A%2F%2Fknn5.laion.ai&index=laion5B
@rom1504 @apolinario the m-clip release gave me a thought: maybe we could host mmc on pypi with essentially none of the other perceptors installed at all. Simple instructions for "finalizing" the mmc install could live in the README (as well as one-liners for specific perceptors PRN), and we could add a warning on import too. maybe we could ship an update script or a CLI command.
My thinking here is if we ship the core tooling as a bare library, then anyone could attach the mocking utilities upstream to quickly make new perceptors drop-in-able if they aren't already, which conversely would make them trivial to add to mmc (since they'd already be hooked into a conformant API one way or another).
Actually, it might be cleaner and simpler to isolate a simple mocking wrapper and package that for pypi?
I'm mostly just thinking out-loud now. Thoughts?
I like the idea and spirit and I feel eventually if MMC gets way too many perceptors making some optional make a lot of sense. Now starting with all optional, I'm not sure - regardless I think your idea holds - just not sure if we ship empty or with some basics (OpenAI + OpenCLIP for e.g.) and let users further install from then on
(New perceptor: https://github.com/microsoft/UniCL)
https://github.com/FacePerceiver/FaRL#use-farl-as-faceclip face clip
On Sat, Jun 25, 2022, 20:18 David Marx @.***> wrote:
https://github.com/microsoft/RegionCLIP
— Reply to this email directly, view it on GitHub https://github.com/dmarx/Multi-Modal-Comparators/issues/2#issuecomment-1166338042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SIUULWF4SL66ON72LVQ5EOVANCNFSM5S6LPHLA . You are receiving this because you were mentioned.Message ID: @.***>
Turkish CLIP https://github.com/yusufani/TrCLIP
EVA-CLIP - https://github.com/baaivision/EVA/blob/master/clip/README.md
basically already api compliant
installable
installable with extra effort
Not installable
Not released
[x] CLIP
[x] CLOOB
[x] SLIP
[ ] CLIP-JAX
[ ] AudioCLIP
[x] CLIPfa (farsi) - https://github.com/sajjjadayobi/CLIPfa
[ ] CLIP pretrained on FOOD101 by PASSL? - https://github.com/PaddlePaddle/PASSL/blob/main/docs/Train_CLIP_model.md
[x] SBERT Multilingual CLIP - https://www.sbert.net/docs/pretrained_models.html#image-text-models
References for more variants:
https://paperswithcode.com/paper/learning-transferable-visual-models-from
Potentially in scope, lower priority
Older stuff
VQA is sort of a generalization of vision language co-training... TBD.
MAGMA could be another useful approach to promote multi-lingual support
https://github.com/Aleph-Alpha/magma