clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.79k stars 471 forks source link

Custom dataset for DONUT #319

Open Dealibros opened 1 week ago

Dealibros commented 1 week ago

Hello!

I'm just starting on my journey of model training. I am in the process of creating a custom dataset using Ubiai for fine-tuning a DONUT model. My goal is to extract data from forms, and I've found the necessary format structure in the documentation. However, I'm uncertain about how to handle checkboxes, which are present in the forms I plan to use. Could you advise on how to include checkboxes in the dataset? Will the DONUT model be able to accurately interpret them?

Thank you in advance for your help.

Greetings Andrea

paloha commented 1 day ago

It should handle check boxes without a problem - it all depends on your targets. I.e., if you have a form in your image data. The first is an open question - your target will be the full text of the answer. If the second question is a multiple answer with check boxes, you need to somehow interpret this in your targets. I did not try this explicitly, but I believe you can just train it to predict "[▢,▢,▣,▢,▣]", or "[0, 0, 1, 0, 1]", or "[2, 4]".