In noticed in the paper that Vary-Base follows the pretraining and SFT paradigm. From the paper, I only know images from LAION-COCO are for pretraining and LLaVA-CC665k, DocVQA and ChartQA are for SFT. Then what about the rendered formula, tables and chart? Are they used in pretraining or SFT?
Hi! Thanks for the excellent work!
In noticed in the paper that Vary-Base follows the pretraining and SFT paradigm. From the paper, I only know images from LAION-COCO are for pretraining and LLaVA-CC665k, DocVQA and ChartQA are for SFT. Then what about the rendered formula, tables and chart? Are they used in pretraining or SFT?