TIGER-AI-Lab / VLM2Vec

This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"
https://tiger-ai-lab.github.io/VLM2Vec/
Apache License 2.0
46 stars 1 forks source link

The results are better than those in the paper #2

Open B-201 opened 6 hours ago

B-201 commented 6 hours ago

I tested some of the datasets in the eval set and found that the results are higher than those in the paper. Is there an update to the LoRA model compared to what was presented in the paper?

wenhuchen commented 4 hours ago

I think some fluctuations is reasonable. We do have some update checkpoints coming soon trained with large batch size.