devsnd / cherrymusic

Stream your own music collection to all your devices! The easy to use free and open-source music streaming server.
http://www.fomori.org/cherrymusic
GNU General Public License v3.0
1.03k stars 189 forks source link

Unicode Encoding fails in py3 #642

Closed hank closed 6 years ago

hank commented 7 years ago

Ran into this today:

Traceback (most recent call last):
  File "/usr/lib/python3.4/logging/__init__.py", line 980, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 308-310: ordinal not in range(128)
Call stack:
  File "/usr/lib/python3.4/threading.py", line 888, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/srv/cherrymusic/cherrymusicserver/util.py", line 49, in wrapper
    result = func(*args, **kwargs)
  File "/srv/cherrymusic/cherrymusicserver/sqlitecache.py", line 470, in full_update
    self.update_db_recursive(cherry.config['media.basedir'], skipfirst=True)
  File "/srv/cherrymusic/cherrymusicserver/sqlitecache.py", line 539, in update_db_recursive
    self.register_file_with_db(item.infs)
  File "/srv/cherrymusic/cherrymusicserver/sqlitecache.py", line 311, in register_file_with_db
    log.e(_("wrong encoding for filename '%s' (%s)"), fileobj.relpath, e.__class__.__name__)
  File "/srv/cherrymusic/cherrymusicserver/log.py", line 126, in error
    _get_logger().error(msg, *args, **kwargs)
Message: "wrong encoding for filename '%s' (%s)"
Arguments: ('Albums/Lindsey Stirling/Lindsey Stirling (V0)/02 Zi-Zi\udce2\udc80\udc99s Journey.mp3', 'UnicodeEncodeError')

The filename is as follows: 02 Zi-Zi’s Journey.mp3

The single quote is a UTF-8 curly: http://www.fileformat.info/info/unicode/char/2019/index.htm Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019)

It's a good test case to make sure UTF8 works. Let me know if I can help - I'm halfway decent at python. Using latest trunk.

tilboerner commented 7 years ago

Hi @hank, sorry for the late response!

Nevermind the 'ascii' encoding error for now, that is thrown by the logger failing to report the original error. As for that, it looks like the filename got (en|de)coded wrong: there's a couple of wrong unicode characters in that path. The error output makes it seem like each of the three UTF-8 bytes for the quotation mark (\xe2\x80\x99) somehow ended up as two-byte UTF-16 low surrogates.

In Python 3, CherryMusic doesn't decode filenames itself, it just concatenates output from os.listdir(...). So, to diagnose:

>>> import os
>>> os.chdir('/PATH/TO/MUSIC/Albums/Lindsey Stirling/Lindsey Stirling (V0)')
>>>   # decoded to unicode:
>>> [x for x in os.listdir('.') if x.startswith('02 Zi')]
>>>   # check actual bytes:
>>> [b for b in os.listdir(b'.') if b.startswith(b'02 Zi')]
>>> import sys
>>> sys.getfilesystemencoding()
tilboerner commented 7 years ago

So, I think we have a duplicate of #595 at our hands. @hank, I'm leaving this ticket open for now because I want to be sure to get your response. I suspect it might be very useful in pinning down this bug. We'll eventually do so over in the other ticket. :hatched_chick:

hank commented 7 years ago

I thought I fixed this in the code, but git tells me I made no changes - weird...

If I figure it out, I'll let you know. I think I just changed an encode() to a decode() and voila! Sorry about that.

tilboerner commented 7 years ago

@hank Could you find out about the stuff I asked above? It would be useful, I think.