avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
516 stars 62 forks source link

UTF-16 problems #56

Closed niloct closed 3 years ago

niloct commented 3 years ago

First of all, thanks for this tool.

I'm trying to decode UTF-16 escaped strings to no avail.

For instance, \u00c1 is converted to A instead of Á:

>>> import unidecode
>>> unidecode.unidecode('\u00c1')
'A'
>>> unidecode.unidecode(u'\u00c1')
'A'

(I've used https://www.branah.com/unicode-converter to verify)

Is there an option to specify the UTF-16 encoding ?

Thanks.

avian2 commented 3 years ago

There is no option to specify UTF-16 encoding because Unidecode provides "ASCII transliterations". "A" output is correct. If you want your string encoded as UTF-16 there is no transliteration needed since UTF-16 can encode the "Á" character.

Please read the Unicode HOWTO for an introduction to character encodings in Python.