Closed StephanAkkerman closed 3 weeks ago
The method of doing this is called "transliteration"
https://github.com/aboSamoor/polyglot (69 languages (nice)) (does not install with pip..) https://github.com/3aransia/3aransia (70 languages (seems like a fork of polyglot) (different than google translate) https://pypi.org/project/PyICU/ -> popular but hard to install
Maybe we can install PyICU easily using this: https://github.com/cgohlke/pyicu-build (does not work)
Best solution without any extra dependencies: google trans Also return the transliteration during the semantic process
use google trans and translate it to the same dest and take the pronunciation
from googletrans import Translator
translator = Translator()
print(translator.translate("안녕하세요.", dest="ko"))
Description:
Problem: For non-latin languages the orthographic similarity cannot be calculated, because it will be 0
Solution: Converting it to a latin script (romaji for Japanse) would be the best
Prerequisites: Look into methods that support the most languages
Tasks:
Additional context