Open plamb-viso opened 9 months ago
The ipynb states:
Prepare dataset The first thing we'll do is add the class names as added tokens to the vocabulary of the decoder of Donut, and the corresponding tokenizer.
And then shows:
additional_tokens = ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]
Why did this step add empty strings and not, for e.g. these class names:
id2label = { 0: "letter", 1: "form", 2: "email", 3: "handwritten", 4: "advertisement", 5: "scientific_report", 6: "scientific_publication", 7: "specification", 8: "file_folder", 9: "news_article", 10: "budget", 11: "invoice", 12: "presentation", 13: "questionnaire", 14: "resume", 15: "memo" }
It's because you're reading the notebook from Github, if you'll open the notebook in Colab you will see the classes.
:)
The ipynb states:
And then shows:
Why did this step add empty strings and not, for e.g. these class names: