TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models
https://arxiv.org/abs/2402.14289
Apache License 2.0
658 stars 68 forks source link

Add some introduce on pretrain and sft data #1

Closed lucasjinreal closed 9 months ago

lucasjinreal commented 9 months ago

Opensource community would be benefit from it.

baichuanzhou commented 9 months ago

We expect to release all details in the following days. Meanwhile, please refer to our paper for more information. https://arxiv.org/abs/2402.14289

baichuanzhou commented 9 months ago

We have updated our README on data preparation

lucasjinreal commented 9 months ago

@baichuanzhou thanks, Do u think there any good data to enhance OCR ability? Currently I found the OCR ability especially Chinese are very weak.

baichuanzhou commented 8 months ago

@lucasjinreal Emmm, I think increasing data and increasing resolutions are both important to improve OCR abilities. As for training data, maybe look at ChartQA, DVQA, etc? Anyway, further exploration in this area is needed.

lucasjinreal commented 8 months ago

@baichuanzhou How to enlarge the vit input size? since if make the size changed, the weights should not properly work