Closed monzug closed 3 years ago
Related to the issue https://github.com/alpheios-project/tokenizer/issues/33 and PR - https://github.com/alpheios-project/tokenizer/pull/34
@irina060981 I got a 500 error when using a file saved as japanese language
Yes it is right - we have two unsupported languages - Japanese and Korean It is described here - https://github.com/alpheios-project/tokenizer/issues/33 And I duplicated this info by email 2 Feburary
Sorry, I didn't see Japanese in the email that's why I added it here. will wait when completed to finish testing all languages
On Mon, Feb 8, 2021 at 1:53 AM Sklyarova Irina notifications@github.com wrote:
Yes it is right - we have two unsupported languages - Japanese and Korean It is described here - alpheios-project/tokenizer#33 https://github.com/alpheios-project/tokenizer/issues/33 And I duplicated this info by email 2 Feburary
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/alignment-editor-new/issues/145#issuecomment-774803615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UOIQRXILIASVYKKOBVLS54YYDANCNFSM4W6T6ADA .
Confirm that it's working for russian, ukrainain, thai and vietnamese but as said in alpheios-project/tokenizer#33 , I am still getting a 500 error for japanese and korean languages.
tested also languages beginning with letter d and e.
Telugu and Sanskrit also give a 500 error. see attachment
in the drop down we have the Ukainian language which I have never heard of. @irina060981 , Could it be a spelling error for Ukrainian? if yes, you could use this issue to fix the misspelled language. Thanks. I will add the two languages with 500 error to alpheios-project/tokenizer#33
@monzug , I have a suggestion - may be it is worth to create a new issue for each language fail with text samples? It would be useful for the developer - to investigate and test it with the ready text sample
I used this text for Telugu language: Pratipattisvatvamula visyamuna mānavulellarunu janmataḥ svataṁtrulunu samānulunu naguduru.
Vāru vivēdanāṁtaḥkaraṇa saṁpannulaguṭacaē parasparamu bhrātṛbhāvamutō vartiṁpavalayunu
in English: All human beings are born free and equal in dignity and rights.
They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
and the same text for Sanskrit from https://omniglot.com/writing/sanskrit.htm
Sarvē mānavāḥ svatantrāḥ samutpannāḥ vartantē api ca, gauravadr̥śā adhikāradr̥śā ca samānāḥ ēva vartantē.
Ētē sarvē cētanā-tarka-śaktibhyāṁ susampannāḥ santi. Api ca, sarvē´pi bandhutva-bhāvanayā parasparaṁ vyavaharantu.
created two new issues to report the problem with Telugu and Sanskrit languages and the misspelled error.
as we had issue with Chinese (and Irina said also Russian, Ukrainian, Thai, Vietnamese), let's test all language in Target or Origin texts. so far, I did all languages starting with A, B and C.