NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.17k stars 1.42k forks source link

TrOCR in other languages #279

Open bely66 opened 1 year ago

bely66 commented 1 year ago

Hi Neils, Thanks for the great work, I'm extremely grateful for the opportunity to learn from your tutorials about transformers.

I have some question regarding training TrOCR from scratch, Let's say I will pre-train it from scratch in another language.

is it possible to change the language model to the language I want to be pretrained on (Xlm-roberta for example)?

And if so how large should be the data to be efficiently pre-trained? (I can see Microsoft used more than 600M sentence images for Pre-Training)

is Microsoft willing to release a multi-lingual version of that model?

Regards,