dbr / tvnamer

Automatic TV episode file renamer, uses data from thetvdb.com via tvdb_api
https://pypi.python.org/pypi/tvnamer/
The Unlicense
912 stars 115 forks source link

Error using unicode data from TVDB #57

Closed nikdoof closed 4 years ago

nikdoof commented 12 years ago

Trying to clear up some Seaquest episodes, and tvnamer fails on one episode called "Give Me Liberté"

####################
# Processing file: SeaQuest DSV - 1x06 - Treasures of the Tonga Trench.avi
# Detected series: SeaQuest DSV (season: 1, episode: 6)
####################
Old filename: SeaQuest DSV - 1x06 - Treasures of the Tonga Trench.avi
New filename: SeaQuest DSV - [S01E06] - Give Me Liberté.avi
Traceback (most recent call last):
  File "/usr/local/bin/tvnamer", line 9, in <module>
    load_entry_point('tvnamer==2.2.1', 'console_scripts', 'tvnamer')()
  File "/usr/local/lib/python2.7/dist-packages/tvnamer/main.py", line 413, in main
    tvnamer(paths = sorted(args))
  File "/usr/local/lib/python2.7/dist-packages/tvnamer/main.py", line 322, in tvnamer
    processFile(tvdb_instance, episode)
  File "/usr/local/lib/python2.7/dist-packages/tvnamer/main.py", line 213, in processFile
    doRenameFile(cnamer, newName)
  File "/usr/local/lib/python2.7/dist-packages/tvnamer/main.py", line 91, in doRenameFile
    cnamer.newName(newName, force = Config['overwrite_destination_on_rename'])
  File "/usr/local/lib/python2.7/dist-packages/tvnamer/utils.py", line 966, in newName
    if os.path.isfile(newpath):
  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
    st = os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 98: ordinal not in range(128)
dbr commented 12 years ago

That file renames without issue for me - are you using the latest version? I need to add a tvnamer --version argument, but you can check with:

$ python -c 'import tvnamer; print tvnamer.__version__'
2.2.1
nikdoof commented 12 years ago
$ python -c 'import tvnamer; print tvnamer.__version__'
(2, 2, 1)

Weird, i'll keep poking at it and see if I can reproduce it again.

[Edit]

Aha, that means i'm missing a few commits in my version (as you fixed the version string after changing to 2.2.1 in setup.py). I'll force a update to the repo version and retest.

[Edit 2]

root@nas:~# pip uninstall tvnamer
Uninstalling tvnamer:
  /usr/local/bin/tvnamer
  /usr/local/lib/python2.7/dist-packages/tvnamer
  /usr/local/lib/python2.7/dist-packages/tvnamer-2.2.1.egg-info
Proceed (y/n)? y
  Successfully uninstalled tvnamer
root@nas:~# pip install -e git+https://github.com/dbr/tvnamer.git#egg=tvnamer
Obtaining tvnamer from git+https://github.com/dbr/tvnamer.git#egg=tvnamer
  Cloning https://github.com/dbr/tvnamer.git to ./src/tvnamer
  Running setup.py egg_info for package tvnamer

    warning: no files found matching 'Fabfile'
Requirement already satisfied (use --upgrade to upgrade): tvdb-api>=1.5 in /usr/local/lib/python2.7/dist-packages (from tvnamer)
Installing collected packages: tvnamer
  Running setup.py develop for tvnamer
    Checking .pth file support in /usr/local/lib/python2.7/dist-packages/
    /usr/bin/python -E -c pass
    TEST PASSED: /usr/local/lib/python2.7/dist-packages/ appears to support .pth files

    warning: no files found matching 'Fabfile'
    Creating /usr/local/lib/python2.7/dist-packages/tvnamer.egg-link (link to .)
    Adding tvnamer 2.2.1 to easy-install.pth file
    Installing tvnamer script to /usr/local/bin

    Installed /root/src/tvnamer
Successfully installed tvnamer
Cleaning up...
root@nas:~# touch "SeaQuest DSV - 1x06 - Treasures of the Tonga Trench.avi"
root@nas:~# tvnamer SeaQuest\ DSV\ -\ 1x06\ -\ Treasures\ of\ the\ Tonga\ Trench.avi
####################
# Starting tvnamer
# Found 1 episode
####################
# Processing file: SeaQuest DSV - 1x06 - Treasures of the Tonga Trench.avi
# Detected series: SeaQuest DSV (season: 1, episode: 6)
TVDB Search Results:
1 -> SeaQuest DSV [en] # http://thetvdb.com/?tab=series&id=76022&lid=7 (default)
2 -> SeaQuest DSV [de] # http://thetvdb.com/?tab=series&id=76022&lid=14
3 -> SeaQuest DSV - A mélység birodalma [hu] # http://thetvdb.com/?tab=series&id=76022&lid=19
Enter choice (first number, return for default, 'all', ? for help):
1
####################
Old filename: SeaQuest DSV - 1x06 - Treasures of the Tonga Trench.avi
New filename: SeaQuest DSV - [01x06] - Give Me Liberté.avi
Rename?
([y]/n/a/q) y
Renaming
Traceback (most recent call last):
  File "/usr/local/bin/tvnamer", line 9, in <module>
    load_entry_point('tvnamer==2.2.1', 'console_scripts', 'tvnamer')()
  File "/root/src/tvnamer/tvnamer/main.py", line 416, in main
    tvnamer(paths = sorted(args))
  File "/root/src/tvnamer/tvnamer/main.py", line 323, in tvnamer
    processFile(tvdb_instance, episode)
  File "/root/src/tvnamer/tvnamer/main.py", line 240, in processFile
    doRenameFile(cnamer, newName)
  File "/root/src/tvnamer/tvnamer/main.py", line 92, in doRenameFile
    cnamer.newName(newName, force = Config['overwrite_destination_on_rename'])
  File "/root/src/tvnamer/tvnamer/utils.py", line 981, in newName
    if os.path.isfile(newpath):
  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
    st = os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 45: ordinal not in range(128)
root@nas:~#

Checking if i've missed a tvdb update or something...

dbr commented 12 years ago

Oh, missed your edit as Github doesn't send notification-emails for them

It's might be caused by your locale setting in your terminal, as this answer explains, and this comment

Or it could be caused by the filesystem you are using.. Could you run:

python -c "import sys; print sys.getfilesystemencoding()"

For me it outputs utf-8 (although if it set LANG this to C or ascii it doesn't reproduce the error)

You could either set your locale to something UTF-8-compatible, or maybe use the windows_safe_filenames config option, which will remove Unicode characters (I think it should replace é with e - if not, could you make another ticket for that, as it should..)

lortordermur commented 7 years ago

Possibly related: Within my wapaname script tvnamer fails when run on an anime that has a localized title with an umlaut.

lordtoran@lenovog5080 /media/lordtoran/transcend1tb/Anime/Serien/Die Ewigkeit, die du dir wünschst $ wapaname
Preparing to launch tvnamer for current dir. Continue [Y/n/?]?

Loading config: /home/lordtoran/.config/tvnamer/tvnamerrc
####################
# Starting tvnamer
Invalid path: 
# Found 14 episodes
####################
# Processing file: 01x01.mkv
Traceback (most recent call last):
  File "/usr/bin/tvnamer", line 4, in <module>
    main()
  File "/usr/share/tvnamer/main.py", line 418, in main
    tvnamer(paths = sorted(args))
  File "/usr/share/tvnamer/main.py", line 325, in tvnamer
    processFile(tvdb_instance, episode)
  File "/usr/share/tvnamer/main.py", line 167, in processFile
    p("# Detected series: %s (%s)" % (episode.seriesname, episode.number_string()))
  File "/usr/share/tvnamer/unicode_helper.py", line 34, in p
    new_args.append(x.encode(kw['encoding']))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 45: ordinal not in range(128)
There was a problem with tvnamer. Aborting.

The culprit is the "ü" in the directory name, as the script passes it to tvnamer as the series name. Temporarily replacing "ü" with "u" produces a match (although they are different letters).

Output of "locale":

LANG=de_DE.UTF-8
LANGUAGE=de:en_US
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
lortordermur commented 7 years ago

I was able to reproduce the issue several times with German-localized anime titles using vanilla tvnamer (so, not inside my script). Kämpfer is especially interesting since the original title is a German word (with an umlaut), and such it applies to all dubs.

$ tvnamer --name Kämpfer *.mkv                                     
####################
# Starting tvnamer
# Found 3 episodes
####################
# Processing file: 01x01.mkv
Traceback (most recent call last):
  File "/usr/bin/tvnamer", line 4, in <module>
    main()
  File "/usr/share/tvnamer/main.py", line 418, in main
    tvnamer(paths = sorted(args))
  File "/usr/share/tvnamer/main.py", line 325, in tvnamer
    processFile(tvdb_instance, episode)
  File "/usr/share/tvnamer/main.py", line 167, in processFile
    p("# Detected series: %s (%s)" % (episode.seriesname, episode.number_string()))
  File "/usr/share/tvnamer/unicode_helper.py", line 34, in p
    new_args.append(x.encode(kw['encoding']))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

However, Kampfer seems to redirect to the English dub of Kämpfer, and only that dub. Dutch, German, Portuguese and French are not offered at all.

$ tvnamer --name Kampfer *.mkv
####################
# Starting tvnamer
# Found 3 episodes
####################
# Processing file: 01x01.mkv
# Detected series: Kampfer (season: 1, episode: 1)
TVDB Search Results:
1 -> Kämpfer [en] # http://thetvdb.com/?tab=series&id=115991&lid=7 (default)
Automatically selecting only result
####################
Old filename: 01x01.mkv
New filename: Kämpfer - [01x01] - Destiny, the Chosen Ones.mkv
Rename?
([y]/n/a/q) q
Quitting

(As a translator I cannot withhold the fact that Kämpfer=fighter and Kampfer=camphor. Swapping "ä" with "a" even makes it an entirely different term, because "ä" is not just a "long a"--like in Hungarian--or some kind of accent, but a separate letter and phoneme contracted from an "ae" ligature back in the days of movable types.)

My filesystem encoding (Linux+ext4):

$ python -c "import sys; print sys.getfilesystemencoding()"
UTF-8

The Unicode issue does not seem to be on sides of the TVDB API, as typing the series title in the website's search field works as expected. However it is potentially a very serious issue with domestic series in places where an extended Latin or a non-Latin alphabet or writing system is used. That means, quite all of Eurasia except the British Isles.

dbr commented 4 years ago

Closing old tickets - newer versions shouldn't have this exact problem at least, as the entire unicode_helper.py module has been removed in current development version. If it's still an issue, best to make new ticket