avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
517 stars 62 forks source link

Incorrect operation of the unidecode_expect_ascii method (python 3.x) #45

Closed mazurenkomm closed 5 years ago

mazurenkomm commented 5 years ago

The library is used in wagtail cms, when setting up the search, I noticed that the Cyrillic text is converted to transliteration. For the spelling dictionary this was unacceptable.

mazurenkomm commented 5 years ago

I use utf8 encoding, so "bytestring = string.encode ('ASCII')" causes an exception. The exception is processed, and returns a string in transliteration. next is a check on the python version, in and Python 3 unicode is used. I need the original string, but the exception handling has already returned it in transliteration.

avian2 commented 5 years ago

What you describe is exactly how the unidecode_expect_ascii is supposed to work. This function takes a Unicode string and returns an ASCII string. If the string can be encoded to ASCII without an error, then no transliteration is necessary (since string is obviously already in ASCII). In the other case, transliteration from Unicode to ASCII is necessary. If you need the original (non-ASCII) string, you should not call unidecode_expect_ascii at all.

mazurenkomm commented 5 years ago

I'm sorry, I'm wrong