clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.52k stars 443 forks source link

Dataset Loader didn't work properly on Kaggle #263

Closed wdprsto closed 8 months ago

wdprsto commented 8 months ago

Good afternoon,

This morning I was trying to run Donut on Kaggle. The structure of the dataset is similar with the one defined on the documentation. However, when I am trying train the model, an error occurred, saying that the "ground truth" didn't exist. While checking on the sample, it shows that the load_dataset recognize the folder name as label and ignore the metadata.jsonl file inside the folder. image

I can read the jsonl file via command, tho. image

I prepare the Donut with this code:

!git clone https://github.com/clovaai/donut.git

!cd donut && pip install .

Thank you for your help

wdprsto commented 8 months ago

Solved by installing datasets ver 2.4 ref