Train DONUT for DocVQA from scratch

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

https://arxiv.org/abs/2111.15664

MIT License

5.75k stars 466 forks source link

Train DONUT for DocVQA from scratch #135

Open emigomez opened 1 year ago

emigomez commented 1 year ago

I would like to repeat the process you have done to generate the https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa model. I think that I have to start from the https://huggingface.co/naver-clova-ix/donut-base model, is it right?

Once I have my own docvqa dataset labelled for training, how can I run the train to start from the base model and obtain one for docvqa? Could you please explain to me how to do that?

Wyzix33 commented 1 year ago

From what i understand from the read me you can finetune using donut-base model or you can start from scratch by setting pretrained_model_name_or_path to null and using only your dataset, this will result in a model that you can than use as pretrained model for other finetunes...

pretrained_model_name_or_path: null # loading a pre-trained model (from moldehub or path)
dataset_name_or_paths: ["dataset/my_dataset"] # loading datasets (from moldehub or path)

Correct me if I'm wrong ...

emigomez commented 1 year ago

My idea is to replicate the process to obtain donut-base-finetuned-docvqa from https://huggingface.co/naver-clova-ix/donut-base.

I prefer to obtain it using the fine-tunning colab notebook, but it is giving me RAM memory issues (using the same huge dataset that was used). These errors appear when I try to create the dataset with dataset libraries of hugging faces. My idea is to put the dataset in HF to later run the FT as it is done in the notebook

Is it possible to run fine-tunning in colab without putting the dataset previously in hugging face?