TIGER-AI-Lab / VLM2Vec

This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"
https://tiger-ai-lab.github.io/VLM2Vec/
Apache License 2.0
80 stars 1 forks source link

Code does not support the training of Llama-3.2-Vision #10

Open haon-chen opened 1 day ago

haon-chen commented 1 day ago

Your current code does not support the training of Llama-3.2-Vision-11B, which is the SOTA open-source VLM. Could you modity your data collator and model to support the training of this VLM?

XMHZZ2018 commented 1 day ago

Thank you for your interest in our work! We are currently working on supporting more VLMs and will be adding Llama-3.2-Vision-11B soon.