Pretraining and SFT of Vary-Base

Ucas-HaoranWei / Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

1.65k stars 150 forks source link

Pretraining and SFT of Vary-Base #78

Open SizeWu opened 4 months ago

SizeWu commented 4 months ago

Hi! Thanks for the excellent work!

In noticed in the paper that Vary-Base follows the pretraining and SFT paradigm. From the paper, I only know images from LAION-COCO are for pretraining and LLaVA-CC665k, DocVQA and ChartQA are for SFT. Then what about the rendered formula, tables and chart? Are they used in pretraining or SFT?