attapol / tltk

Thai Language Toolkit
GNU Lesser General Public License v3.0
24 stars 5 forks source link

Romanization of Thai names #2

Open dlauc opened 2 years ago

dlauc commented 2 years ago

Thank you for the great library. I've been trying to perform romanization of Thai names using the th2roman function and not getting the results I am expecting. (if I've got it right there is an ISO standard (11940-2) equal to the Royal Thai General System of Transcription). So, when I try the th2roman for this repo author's name th2roman('อรรถพล'), the result is 'at phon' and I think it should be Attaphol, like in Wikidata manual transliterations. I've also tried related PyThaiNLP library but it gives even more weird results ('nntphon').

awirote commented 1 year ago

Selecting pronunciation is based on word boundary. In the case 'อรรถพล', it is a name and not in the dictionary. The program analyze it as two words อรรถ and พล. That's why the result is at phon. Add this into the dictionary using tltk.nlp.TDICT['อรรถพล'] = 1, it will be one word and the program will produce atthaphon