avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
517 stars 62 forks source link

Cannot handle Bernières #48

Closed jrbray1 closed 5 years ago

jrbray1 commented 5 years ago

pip install unidecode Collecting unidecode Downloading https://files.pythonhosted.org/packages/31/39/53096f9217b057cb049fe872b7fc7ce799a1a89b76cf917d9639e7a558b5/Unidecode-1.0.23-py2.py3-none-any.whl (237kB)

echo de Bernières | unidecode Unable to decode input: ordinal not in range(128), start: 8, end: 9

avian2 commented 5 years ago

Very likely this is due to wrong locale settings on your computer. Either fix your locale or specify encoding with -e.

jrbray1 commented 5 years ago

Thanks for the quick reply

Not sure where my locale settings are. The only relevant env var I can see is env | grep -i LANG LANG=C GDM_LANG=en_GB LANGUAGE=en_GB:en

and when I issue the command locale I get LANG=C LANGUAGE=en_GB:en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=

What -e encoding options are possible?

avian2 commented 5 years ago

Try -e utf8. Full list is here.

This is a workaround. With LANG=C you will encounter other encoding problems if you're working with non-ASCII characters on the command line. Try to fix that (look into the documentation for the Linux distribution you are using how to set that).

jrbray1 commented 5 years ago

Ah, my .profile was manually setting LANG=C (inherited from years back, I can't remember what problem it solved). removing that changes to a system (Mint 1.20) default of en_GB.UTF-8, and vi, which used to have funny characters for somethings, now looks a lot better, they are being rendered as emdashes etc, and unidecode is converting them to a double dash.

So I've a solution to my problem here, we shall see if that LANG=C was needed for something else ...

Thanks for your help