PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k stars 7.32k forks source link

Burmese/release/2.7.1 #12014

Closed 1chimaruGin closed 2 weeks ago

1chimaruGin commented 2 weeks ago

Added

The corpus is from https://github.com/ye-kyaw-thu/myPOS and cleaned non-burmese characters.

If you need more sentences for corpus, please contact me. The full corpus have 1.3M sentences.

paddle-bot[bot] commented 2 weeks ago

Thanks for your contribution!

CLAassistant commented 2 weeks ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


1chimaruGin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

1chimaruGin commented 2 weeks ago

Signed CLA but, it still says not signed yet.

Screenshot 2024-04-27 at 15 31 22
jzhang533 commented 2 weeks ago

Signed CLA but, it still says not signed yet.

Your author email is not connected with email you registered in github, please check comment made by CLAassistant.

Please make your contribution based on main branch, which is the branch we are using for developing activities.

We also need to discuss is it appropriate to include corpus data into the repository. Since it will increase repository size, and we also need to respect the original license of myPOS. An alternative way I can think, is using a separate repository to host the corups, but include a link to the repository in PaddleOCR.

1chimaruGin commented 2 weeks ago

Got it @jzhang533 , Thank you!

jzhang533 commented 2 weeks ago

Got it @jzhang533 , Thank you!

  • Will close this PR and make it new.
  • For the text corpus,
    • Will you create new repository or should I create one?
    • It is also possible for me to use many public data source or own dataset.

You can create one, and provide a link in PaddleOCR.