avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
517 stars 62 forks source link

Unicode spaces should be converted to ASCII 32 #31

Closed sam-s closed 5 years ago

sam-s commented 5 years ago

Current behavior:

unidecode("a"+chr(2000)+"b") == "a[?]b"

Desired behavior:

unidecode("a"+chr(2000)+"b") == "a b"

Rationale

Unicode spaces naturally correspond to the usual ascii spaces.

avian2 commented 5 years ago

Unicode spaces are transliterated correctly. I'm guessing you mean the EN QUAD character. This is code point U+2000 (in hex). You forgot 0x in front of your 2000.

>>> from unidecode import unidecode
>>> unidecode("a"+chr(0x2000)+"b")
'a b'