X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.12k stars 68 forks source link

DocOwl1.5-Omni Training #70

Closed AlbertShenC closed 1 month ago

AlbertShenC commented 1 month ago

Thanks for your excellent work.

I'm curious about how DocOwl1.5-Omni is trained. Is it obtained through continued training based on DocOwl1.5-stage1? What are the training data and hyper-parameter? As I cannot find the related description in DocOwl1.5 paper and docs.

HAWLYQ commented 1 month ago

Hi, @AlbertShenC Omini is trained with the same setting as DocOwl 1.5-Chat, but adding sampled 0.2M samples from DocStruct4M during the 2nd stage for maintaining parsing and grounding abilities. (DocOwl-Omini is trained around 1w steps to keep the same training epochs as DocOwl 1.5-Chat)