Closed luohao123 closed 9 months ago
@LinB203 thanks, seems the pretraining part lack of Chinese data. If I want strengthen the Chinese capbility and OCR ability, which dataset should use, and should add it to pretrain stage or finetuning stage?
@LinB203 thanks, seems the pretraining part lack of Chinese data. If I want strengthen the Chinese capbility and OCR ability, which dataset should use, and should add it to pretrain stage or finetuning stage?
add Chinese dataset to finetuning stage or replace a stronger LLM that support Chinese.
@LinB203 Why not add both in pretrain and finetuning stage?
@LinB203 Why not add both in pretrain and finetuning stage?
In pretrained stage, the model only train the mlp adapter, which adapt the vision tokens to LLM. Actually the LLM do not be trained. So we recommend training Chinese datasets in fine-tuning stage.
It looks like in llava1.6 and Yi-VL, they train both vision encoder and adapter at stage1 and stage2, then all paramteres in stage3.
We notice that. LLaVA-1.6 show a promising results and we will follow them. The stronger MoE-LLaVA is on the way.
@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?
@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?
I think just follow the datasets supposed by LLaVA-1.6, replace the LLM to Qwen-7B, which support Chinese better. We are doing this, WIP.
Oh, Dude, do not do things that didn't have any creative, I would suggest you add more Chinese data. LLava1.6 didn't support Chinese as well.
Meanwhile, why you have to using CLIP? the vision encoder has more powerful one
We notice that. LLaVA-1.6 show a promising results and we will follow them. The stronger MoE-LLaVA is on the way.
Looking forward to this!
@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?
I think just follow the datasets supposed by LLaVA-1.6, replace the LLM to Qwen-7B, which support Chinese better. We are doing this, WIP.
This data not for Chinese
I have tried LlaVa v1.6,the OCR capability for Chinese characters is terrible! I would suggest to train new model based on Qwen-VL, which is much better.
https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/docs/TRAIN.md