The results are better than those in the paper

TIGER-AI-Lab / VLM2Vec

This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"

https://tiger-ai-lab.github.io/VLM2Vec/

Apache License 2.0

84 stars 1 forks source link

The results are better than those in the paper #2

Closed B-201 closed 2 weeks ago

B-201 commented 1 month ago

I tested some of the datasets in the eval set and found that the results are higher than those in the paper. Is there an update to the LoRA model compared to what was presented in the paper?

wenhuchen commented 1 month ago

I think some fluctuations is reasonable. We do have some update checkpoints coming soon trained with large batch size.

khazic commented 1 month ago

I’m curious why the index of full train is not as good as lora?

wenhuchen commented 1 month ago

We don't know either. One hypothesis is that the full finetune will overfit to training set more easily.