PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.97k stars 125 forks source link

traning dataset? #26

Closed luohao123 closed 9 months ago

LinB203 commented 9 months ago

https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/docs/TRAIN.md

luohao123 commented 9 months ago

@LinB203 thanks, seems the pretraining part lack of Chinese data. If I want strengthen the Chinese capbility and OCR ability, which dataset should use, and should add it to pretrain stage or finetuning stage?

LinB203 commented 9 months ago

@LinB203 thanks, seems the pretraining part lack of Chinese data. If I want strengthen the Chinese capbility and OCR ability, which dataset should use, and should add it to pretrain stage or finetuning stage?

add Chinese dataset to finetuning stage or replace a stronger LLM that support Chinese.

luohao123 commented 9 months ago

@LinB203 Why not add both in pretrain and finetuning stage?

LinB203 commented 9 months ago

@LinB203 Why not add both in pretrain and finetuning stage?

In pretrained stage, the model only train the mlp adapter, which adapt the vision tokens to LLM. Actually the LLM do not be trained. So we recommend training Chinese datasets in fine-tuning stage.

luohao123 commented 9 months ago

It looks like in llava1.6 and Yi-VL, they train both vision encoder and adapter at stage1 and stage2, then all paramteres in stage3.

LinB203 commented 9 months ago

We notice that. LLaVA-1.6 show a promising results and we will follow them. The stronger MoE-LLaVA is on the way.

luohao123 commented 9 months ago

@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?

LinB203 commented 9 months ago

@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?

I think just follow the datasets supposed by LLaVA-1.6, replace the LLM to Qwen-7B, which support Chinese better. We are doing this, WIP.

luohao123 commented 9 months ago

Oh, Dude, do not do things that didn't have any creative, I would suggest you add more Chinese data. LLava1.6 didn't support Chinese as well.

Meanwhile, why you have to using CLIP? the vision encoder has more powerful one

lazyhope commented 9 months ago

We notice that. LLaVA-1.6 show a promising results and we will follow them. The stronger MoE-LLaVA is on the way.

Looking forward to this!

whalefa1I commented 8 months ago

@LinB203 In such case, any consider to do for enhance Chinese OCR ability? Which specifically dataset could be use ?

I think just follow the datasets supposed by LLaVA-1.6, replace the LLM to Qwen-7B, which support Chinese better. We are doing this, WIP.

FYI https://github.com/nttmdlab-nlp/InstructDoc

lucasjinreal commented 8 months ago

This data not for Chinese

thiner commented 7 months ago

I have tried LlaVa v1.6,the OCR capability for Chinese characters is terrible! I would suggest to train new model based on Qwen-VL, which is much better.