TIGER-AI-Lab / VLM2Vec

This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"
https://tiger-ai-lab.github.io/VLM2Vec/
Apache License 2.0
80 stars 1 forks source link

Use Qwen2-VL as backbone #1

Closed VoVAllen closed 1 month ago

VoVAllen commented 1 month ago

Qwen2-VL showed much better performance on multiple tasks. Will VLM2Vec try it?

XMHZZ2018 commented 1 month ago

@VoVAllen Thanks for pointing that out! Using Qwen2-VL as the backbone is definitely one of the most important updates planned for VLM2Vec. Other potential ways to further improve the model include mining more hard negatives during training, incorporating more diverse training data, or integrating pure text-based tasks. We may release an improved version of VLM2Vec in the future.