avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
516 stars 62 forks source link

Unexpected behavior in Python 3 #59

Closed rosstex closed 3 years ago

rosstex commented 3 years ago

I'm trying to understand what's happening here. I have the following text in a file:

"\u003e"

But decoding this has no effect:

dom = open('dom.txt', encoding='utf-8').read() unidecode(dom)

"\u003e"

But "\u003e" -> ">"

Why is this not decoded as expected?

avian2 commented 3 years ago

This question has nothing to do with Unidecode. Unidecode does not encode or decode escape sequences, or perform encoding or decoding of strings. It is not the right tool for your task.

You are probably confused because Python will interpret the \u escape in string literals in source code, but not when reading from files. Python Unicode HOWTO is a good place to start if you want to understand how this works. Also, look up encoding='unicode-escape' in Python docs.

rosstex commented 3 years ago

Yep, now understood, thanks! Your intent is "è" -> "e", which is also useful to me. I'll use both, thanks!