Open Theerath opened 11 months ago
Hi,
Donut has been pre-trained on 4 languages: English, Chinese, Korean and Japanese. The tokenizer however supports 100 different languages, as it uses the one of XLM-RoBERTa. If you want to use a different tokenizer, you will have to train a new model from scratch.
Hi, Will Donut able to extract English + Arabic text at same time
Is it possible to use a different tokenizer with multiple language support for the Donut processor? like mbart tokenizer in the Donut processor instead of xlmrobertafast @NielsRogge