google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.25k stars 147 forks source link

Is there Pytorch version CLIPPO? #57

Closed yukistavailable closed 11 months ago

yukistavailable commented 11 months ago

Thank you for releasing code for these inspiring works! Especially, I'm interested in CLIPPO.

Are there any plans to release a Pytorch version?

mitscha commented 11 months ago

Thanks for your interest! We have not planned a PyTorch release for CLIPPO.

If you're interested in using the pretrained checkpoints with PyTorch, it shouldn't be too hard to convert them, as they are stored as npz files and we follow the standard ViT design. If you're interested in training in pytorch, you could just port the preprocessing function to your favorite CLIP library and adapt the code to do two forward passes through the vision encoder (one for the natural image and one for the text image).

yukistavailable commented 11 months ago

Thank you for the detailed explanation!

OK, I will do the conversion by myself. 😄