NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.18k stars 1.42k forks source link

DONUT-docvqa FT using donut-base-finetuned-docvqa #237

Open emigomez opened 1 year ago

emigomez commented 1 year ago

Thank you very much for your work!!

I'm working on https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/DocVQA/Fine_tune_Donut_on_DocVQA.ipynb and I'm wondering if it is possible to FT the docvqa model on my own dataset but starting from a model pretrained in this task before, instead of using the donut-base as base model for the training.

config = VisionEncoderDecoderConfig.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa", config=config)

Is it correct to make the FT in this way?

NielsRogge commented 1 year ago

Hi,

It's definitely possible to start from the already fine-tuned model. It might be that you need some additional special tokens to the model's vocabulary, but other than than that looks ok.