clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.7k stars 462 forks source link

how to finetune a model with a downsteam task that is same with the pre-train task? #148

Open SleepEarlyLiveLong opened 1 year ago

SleepEarlyLiveLong commented 1 year ago

I want to finetune a model based on "naver-clova-ix/donut-base" with a downstream task that is different from those three tasks mentioned in the paper (Document Classification, Document Information Extraction, and Document Visual Question Answering), but same as the Pre-training task, which is to say, I want to teach the model to learn "how to read" better. In that task, I will input an image to the donut-base, expecting the model to output all the text in the image and calculate the loss against the pre-prepared correct text, and then use the loss for backpropagation. The question is, how should I modify the released code or configuration file? Thank you!

willpat1213 commented 1 year ago

I also have the same need as you, have you solved it now?

ChrisDelClea commented 1 year ago

@moonbings @SamSamhuns @eltociear , any of you folks have a solution for it? I think it would be helpful to include the train_pretrain.yaml in the config file folder!

AmT42 commented 1 year ago

isn't the same with finetuning on information extraction considering ground truth label all the tokens on the document you want your model learn to read ?

bugface commented 10 months ago

it is just pseudo reading task as describe in the doc:

For (Pseudo) Text Reading Task The gt_parse looks like {"text_sequence" : "word1 word2 word3 ... "}