Training new language - Githubissues

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

6k stars 511 forks source link

Training new language #41

Open shinab930 opened 2 months ago

shinab930 commented 2 months ago

Hi, for training a new language like arabic do we have to train from stage-1 or stage-2/stage-3? Also, how much data is needed for a good accuracy? Appreciate any insights

Ucas-HaoranWei commented 2 months ago

You only need to fine-tune the model. I think >30w data is OK~

minhduc01168 commented 2 months ago

@Ucas-HaoranWei Hello, do you have a tutorial on how to build a dataset and how to train a model for a new language? I would be very grateful if you could help me.

mariamsaed19 commented 1 month ago

You only need to fine-tune the model. I think >30w data is OK~

hello, thanks for your great work. I have an inquiry. What is the meaning of "w" in this context?

mohamadmansourX commented 1 month ago

You only need to fine-tune the model. I think >30w data is OK~

hello, thanks for your great work. I have an inquiry. What is the meaning of "w" in this context?

I assume means ~more than 300,000 data points for fine-tuning

w --> wan --> ten thousand; https://chinese.yabla.com/chinese-english-pinyin-dictionary.php?define=%E4%B8%87

MohamedLahmeri01 commented 1 month ago

@minhduc01168 hello , did find out how to train on new language ?