aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.31k stars 337 forks source link

Fix transliteration when the source language is not "en". #46

Closed MerlijnWajer closed 8 years ago

MerlijnWajer commented 8 years ago

The transliterate function only worked when the source language was set to "en".

The problem was that the transliterate function would turn the source_lang text into english, and then turn that english part into whatever the destination language is...

Except that when turning the text from the converted-english part into the destination language, it never took the converted-english part, but rather the non-decoded text.

Please consider releasing a new version. A workaround for people affected by this issue is to call encoder and decoder manually.

MerlijnWajer commented 8 years ago

Before:

from polyglot.transliteration import Transliterator transliterator = Transliterator(source_lang="ru", target_lang="en") print(transliterator.transliterate(u'Панненкоек')) Панненкоек

After:

from polyglot.transliteration import Transliterator transliterator = Transliterator(source_lang="ru", target_lang="en") print(transliterator.transliterate(u'Панненкоек')) pannenkoek

aboSamoor commented 8 years ago

Thanks for the pull request.