Open emigomez opened 1 year ago
From what i understand from the read me you can finetune using donut-base model or you can start from scratch by setting pretrained_model_name_or_path to null and using only your dataset, this will result in a model that you can than use as pretrained model for other finetunes...
pretrained_model_name_or_path: null # loading a pre-trained model (from moldehub or path)
dataset_name_or_paths: ["dataset/my_dataset"] # loading datasets (from moldehub or path)
Correct me if I'm wrong ...
My idea is to replicate the process to obtain donut-base-finetuned-docvqa from https://huggingface.co/naver-clova-ix/donut-base.
I prefer to obtain it using the fine-tunning colab notebook, but it is giving me RAM memory issues (using the same huge dataset that was used). These errors appear when I try to create the dataset with dataset libraries of hugging faces. My idea is to put the dataset in HF to later run the FT as it is done in the notebook
Is it possible to run fine-tunning in colab without putting the dataset previously in hugging face?
I would like to repeat the process you have done to generate the https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa model. I think that I have to start from the https://huggingface.co/naver-clova-ix/donut-base model, is it right?
Once I have my own docvqa dataset labelled for training, how can I run the train to start from the base model and obtain one for docvqa? Could you please explain to me how to do that?