Muennighoff / vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
https://arxiv.org/abs/2012.07788
MIT License
88 stars 29 forks source link

Using Image and text together for classification #9

Open karndeepsingh opened 2 years ago

karndeepsingh commented 2 years ago

Hi, I want to train a Multi-modal using Image and Text for Multi-label classification.

Can you please help me to understand what latest multi-modal are available that takes image and text as an input and fine-tune on my classification task.

Looking forward to your reply.

thanks

Muennighoff commented 2 years ago

Hi, you can find a list of multi-modal models implemented in this codebase here