X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.12k stars 68 forks source link

finetuning mPLUG-DocOwl for documnet data extraction #71

Open himasai9712 opened 1 month ago

himasai9712 commented 1 month ago

Hey @HAWLYQ / @LukeForeverYoung i wanna finetune DocOwl model for data extraction from image or pdf so where can i fine the Finetune code to finetune me model on custom and can you please suggest me for best performance minimum how many images are required and i wanna use lora finetune can you show me where is script is and suggest me what instance type can i use for finetuning it in AWS.

HAWLYQ commented 1 month ago

HI, @himasai9712 , we have released the finetuning script at https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl1.5. It temporarily doesn't support lora finetuning and we will test a lora version recently~

The minimum number of images depends on your tasks and what performance you want. More data brings better performance~ You can refer to the training distribution of different tasks in our downstream dataset DocDownstream-1.0. For example, there are around 10k images for the DocVQA task~

HAWLYQ commented 1 month ago

Hi, @himasai9712, we have also uploaded finetuning scripts with lora in DocOwl1.5