clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Trying to run DOCVQA dataset #278

Open srgautam9 opened 6 months ago

srgautam9 commented 6 months ago

I was trying DOCVQA dataset which is presented in the original repository. I added gt_parses in the train_v1.0.json in the given format. First I got error from pyarrow, which I solved. Now, I am getting this error.

raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

Please help in releasing the docvqa dataset format, specially the metadata.jsonl