Closed 1chimaruGin closed 2 weeks ago
Thanks for your contribution!
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
Signed CLA but, it still says not signed yet.
Signed CLA but, it still says not signed yet.
Your author email is not connected with email you registered in github, please check comment made by CLAassistant.
Please make your contribution based on main branch, which is the branch we are using for developing activities.
We also need to discuss is it appropriate to include corpus data into the repository. Since it will increase repository size, and we also need to respect the original license of myPOS. An alternative way I can think, is using a separate repository to host the corups, but include a link to the repository in PaddleOCR.
Got it @jzhang533 , Thank you!
Got it @jzhang533 , Thank you!
- Will close this PR and make it new.
- For the text corpus,
- Will you create new repository or should I create one?
- It is also possible for me to use many public data source or own dataset.
You can create one, and provide a link in PaddleOCR.
Added
ppocr/utils/corpus/bm_corpus.txt
: 221010 sentencesppocr/utils/dict/bm_dict.txt
: 160 charactersLn65-74
- number 0 to 9Ln75-76
-၊
and။
which are punctuation mark similar to,
and.
.The corpus is from https://github.com/ye-kyaw-thu/myPOS and cleaned
non-burmese
characters.If you need more sentences for corpus, please contact me. The full corpus have 1.3M sentences.