google-code-export / beets

Automatically exported from code.google.com/p/beets
MIT License
0 stars 0 forks source link

Undecodable filenames #83

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What's the problem? How can I reproduce it?

Traceback (most recent call last):
  File "/usr/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b4', 'console_scripts', 'beet')()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/__init__.py", line 439, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 533, in import_func
    opts.logpath, art, threaded, color)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 478, in import_files
    pl.run_parallel()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/pipeline.py", line 94, in run
    msg = self.coro.next()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 319, in read_albums
    for path, items in autotag.albums_in_dir(os.path.expanduser(toppath)):
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/autotag/__init__.py", line 116, in albums_in_dir
    for root, dirs, files in _sorted_walk(path):
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/autotag/__init__.py", line 107, in _sorted_walk
    for res in _sorted_walk(cur):
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/autotag/__init__.py", line 90, in _sorted_walk
    base = library._unicode_path(base)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 167, in _unicode_path
    return path.decode(sys.getfilesystemencoding())
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 14-16: invalid 
data

I think it's doing it on this album: 
http://www.amazon.com/Rough-Guide-Himalayas-Various-Artists/dp/B0000668LN

If it seems relevant, what system are you running on (OS, Python version,
etc.)? CentOS, Pythong 2.6

Original issue reported on code.google.com by dleink on 4 Aug 2010 at 2:23

GoogleCodeExporter commented 9 years ago
Looks similar...

[root@ct183120 beets]# beet import /var/downloads/"Arcade Fire - Funeral"
Traceback (most recent call last):
  File "/usr/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b4', 'console_scripts', 'beet')()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/__init__.py", line 439, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 533, in import_func
    opts.logpath, art, threaded, color)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 478, in import_files
    pl.run_parallel()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/pipeline.py", line 94, in run
    msg = self.coro.next()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 319, in read_albums
    for path, items in autotag.albums_in_dir(os.path.expanduser(toppath)):
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/autotag/__init__.py", line 116, in albums_in_dir
    for root, dirs, files in _sorted_walk(path):
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/autotag/__init__.py", line 90, in _sorted_walk
    base = library._unicode_path(base)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 167, in _unicode_path
    return path.decode(sys.getfilesystemencoding())
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 22-24: invalid 
data

Original comment by dleink on 4 Aug 2010 at 2:07

GoogleCodeExporter commented 9 years ago
Grr! This sort of thing is supposed to be fixed. Thanks for the report -- any 
chance I could get your help diagnosing what's going on? It's not something I 
can reproduce here.

If you get a chance, open up a Python shell and run:
>>> import os
>>> os.listdir(u"/var/downloads/Arcade Fire - Funeral")
And then maybe even:
>>> from beets import library
>>> library._unicode_path("/var/downloads/Arcade Fire - Funeral")
And:
>>> os.listdir(library._unicode_path("/var/downloads/Arcade Fire - Funeral")

If you let me know what these commands output, I may be able to get a better 
handle on what's going on here.

For the record, these things are the things that seem inconsistent:
* os.listdir() is supposed to give Unicode output when given Unicode input, and 
I'm careful to always give it Unicode input. Therefore, the call "base = 
library._unicode_path(base)" shouldn't do any encoding.
* I'm decoding a path, which came from the filesystem, using the filesystem 
encoding. This should never cause an error -- either the filesystem is lying 
about which encoding it uses or it's giving us corrupt filenames.

Original comment by adrian.sampson on 4 Aug 2010 at 5:06

GoogleCodeExporter commented 9 years ago
Python 2.6.5 (r265:79063, Apr  9 2010, 15:16:58)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir(u"/var/downloads/Arcade Fire - Funeral")
[u'Funeral (.m3u).m3u', '02 Neighborhood #2 (La\xefka).flac', '03 Une Ann\xe9e 
Sans Lumi\xe8re.flac', u'Funeral (log).log', u'06 Crown of Love.flac', u'04 
Neighborhood #3 (Power Out).flac', u'07 Wake Up.flac', u'09 Rebellion 
(Lies).flac', '08 Ha\xefti.flac', u'05 Neighborhood #4 (7 Kettles).flac', u'01 
Neighborhood #1 (Tunnels).flac', u'10 In the Backseat.flac', u'Funeral 
(.cue).cue']
>>> from beets import library
>>> library._unicode_path("/var/downloads/Arcade Fire - Funeral")
u'/var/downloads/Arcade Fire - Funeral'
>>> os.listdir(library._unicode_path("/var/downloads/Arcade Fire - Funeral")
... )
[u'Funeral (.m3u).m3u', '02 Neighborhood #2 (La\xefka).flac', '03 Une Ann\xe9e 
Sans Lumi\xe8re.flac', u'Funeral (log).log', u'06 Crown of Love.flac', u'04 
Neighborhood #3 (Power Out).flac', u'07 Wake Up.flac', u'09 Rebellion 
(Lies).flac', '08 Ha\xefti.flac', u'05 Neighborhood #4 (7 Kettles).flac', u'01 
Neighborhood #1 (Tunnels).flac', u'10 In the Backseat.flac', u'Funeral 
(.cue).cue']
>>>

Original comment by dleink on 4 Aug 2010 at 5:31

GoogleCodeExporter commented 9 years ago
Awesome, thanks! That helped a lot.

Unfortunately, the problem of "undecodable paths" has opened up an enormous, 
horrible can of worms about filesystem encodings and Unicode. I still need to 
figure out exactly how I'm going to address the issue, because handling 
malformed filenames internally will add a *lot* of complexity to the core of 
beets.

For the time being, though, I've just pushed a new revision that just ignores 
filenames that can't be decoded. This is, of course, not a very good solution 
but at least the tagger won't completely crash partway through...

I'll let you know when I have a better answer.

Original comment by adrian.sampson on 4 Aug 2010 at 7:15

GoogleCodeExporter commented 9 years ago
Issue 81 has been merged into this issue.

Original comment by adrian.sampson on 4 Aug 2010 at 7:16

GoogleCodeExporter commented 9 years ago

Original comment by adrian.sampson on 4 Aug 2010 at 7:16

GoogleCodeExporter commented 9 years ago
Okay! I just pushed a few changes that make beets handle paths as opaque 
bytestrings (rather than Unicode) end-to-end. With any luck, that should make 
this problem just go away!

Sorry for using you as a guinea pig, dleink, but any chance you could check out 
the latest version and see if this fixes everything?

Original comment by adrian.sampson on 5 Aug 2010 at 8:42

GoogleCodeExporter commented 9 years ago
Looking good on that Arcade Fire album, let's see how it goes with the rest of 
the library...

Original comment by dleink on 5 Aug 2010 at 9:33

GoogleCodeExporter commented 9 years ago

Original comment by adrian.sampson on 5 Aug 2010 at 10:53

GoogleCodeExporter commented 9 years ago
Can't quite figure which album is causing this..

  File "/usr/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b4', 'console_scripts', 'beet')()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/__init__.py", line 439, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 552, in import_func
    opts.logpath, art, threaded, color)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 497, in import_files
    pl.run_parallel()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/pipeline.py", line 179, in run
    self.coro.send(msg)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 456, in apply_choices
    albuminfo.set_art(artpath)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 1142, in set_art
    self.artpath = artdest
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 1048, in __setattr__
    self._library.conn.execute(sql, (value, self.id))
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a 
text_factory that can interpret 8-bit bytestrings (like text_factory = str). It 
is highly recommended that you instead just switch your application to Unicode 
strings.

Original comment by dleink on 6 Aug 2010 at 12:51

GoogleCodeExporter commented 9 years ago
Another one that looks related..

Traceback (most recent call last):
  File "/usr/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b4', 'console_scripts', 'beet')()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/__init__.py", line 439, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 552, in import_func
    opts.logpath, art, threaded, color)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 497, in import_files
    pl.run_parallel()
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/pipeline.py", line 179, in run
    self.coro.send(msg)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/ui/commands.py", line 444, in apply_choices
    item.move(lib, True)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 318, in move
    dest = library.destination(self)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 834, in destination
    subpath = _sanitize_path(subpath)
  File "/usr/lib/python2.6/site-packages/beets-1.0b4-py2.6.egg/beets/library.py", line 194, in _sanitize_path
    comp = regex.sub(repl, comp)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: 
ordinal not in range(128)

Original comment by dleink on 6 Aug 2010 at 3:10

GoogleCodeExporter commented 9 years ago
Thank you! These were both really helpful; I needed to tie up a couple of loose 
ends. Commit af9f45480ece should fix the first error (having to do with album 
art paths). Commit 567af055c562 should fix the second (the Unicode error in 
_sanitize_path). Sorry for the rocky road to stability...

Original comment by adrian.sampson on 6 Aug 2010 at 5:03

GoogleCodeExporter commented 9 years ago

Original comment by adrian.sampson on 10 Aug 2010 at 5:55

GoogleCodeExporter commented 9 years ago
I'm afraid this issue is back with a vengeance in 1.0b6

{{{switch:torrent daenney$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"}}}

{{{switch:torrent daenney$ python -c 'import sys; print 
sys.getfilesystemencoding()'
utf-8}}}

This happens when the following file is encountered:
8 - Fallen Snow (Skatebård Remix).mp3

{{{Traceback (most recent call last):
  File "/usr/local/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b6', 'console_scripts', 'beet')()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/__init__.py", line 457, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 617, in import_func
    opts.logpath, art, threaded, color, delete, quiet)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 559, in import_files
    pl.run_parallel()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/pipeline.py", line 94, in run
    msg = self.coro.next()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 343, in read_albums
    for path, items in autotag.albums_in_dir(os.path.expanduser(toppath)):
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/autotag/__init__.py", line 122, in albums_in_dir
    i = library.Item.from_path(os.path.join(root, filename))
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/library.py", line 254, in from_path
    i.read(path)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/library.py", line 319, in read
    f = MediaFile(_syspath(read_path))
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/mediafile.py", line 490, in __init__
    self.mgfile = mutagen.File(path)
  File "/Library/Python/2.6/site-packages/mutagen-1.20-py2.6.egg/mutagen/__init__.py", line 203, in File
    fileobj = file(filename, "rb")
IOError: [Errno 2] No such file or directory: '/Volumes/data/unsorted/The Bird 
of Music/The Bird of Music Remixes/8 - Fallen Snow (Skateba\xcc\x8ard 
Remix).mp3'}}}

Another example:
03 Ultraviolence (Château Marmont Remix) 1.mp3

{{{Traceback (most recent call last):
  File "/usr/local/bin/beet", line 9, in <module>
    load_entry_point('beets==1.0b6', 'console_scripts', 'beet')()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/__init__.py", line 457, in main
    subcommand.func(lib, config, suboptions, subargs)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 617, in import_func
    opts.logpath, art, threaded, color, delete, quiet)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 559, in import_files
    pl.run_parallel()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/pipeline.py", line 94, in run
    msg = self.coro.next()
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/ui/commands.py", line 343, in read_albums
    for path, items in autotag.albums_in_dir(os.path.expanduser(toppath)):
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/autotag/__init__.py", line 122, in albums_in_dir
    i = library.Item.from_path(os.path.join(root, filename))
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/library.py", line 254, in from_path
    i.read(path)
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/library.py", line 319, in read
    f = MediaFile(_syspath(read_path))
  File "/Library/Python/2.6/site-packages/beets-1.0b6-py2.6.egg/beets/mediafile.py", line 490, in __init__
    self.mgfile = mutagen.File(path)
  File "/Library/Python/2.6/site-packages/mutagen-1.20-py2.6.egg/mutagen/__init__.py", line 203, in File
    fileobj = file(filename, "rb")
IOError: [Errno 2] No such file or directory: 
'/Volumes/data/unsorted/HeartsRevolution- Ultraviolence/03 Ultraviolence 
(Cha\xcc\x82teau Marmont Remix) 1.mp3'}}}

This would be similar to the issue 81.

Original comment by daniele.sluijters on 31 Jan 2011 at 12:45

GoogleCodeExporter commented 9 years ago
And another:

Sexy Sushi - Tu l'as bien mérité

This results in the following beet command:
beet import Sexy\ Sushi\ -\ Tu\ l\'as\ bien\ me\314rite\314\/

Basically, the terminal returns immediately and nothing happens.

Unfortunately, this seems to be a problem that only concerns Mac OS X and Samba:

When I try to cd on the original filesystem into that directory the command is 
completed like this:
cd Sexy\ Sushi\ -\ Tu\ l\'as\ bien\ mérité
The beet import command results in:
beet import Sexy\ Sushi\ -\ Tu\ l\'as\ bien\ mérité

Just did a little more research about UTF-8 and Samba:
Although Mac OS X uses UTF-8 as its encoding method for filenames, it uses an 
extended UTF-8 specification that Samba cannot handle, so UTF-8 locale is not 
available for Mac OS X.

Basically this means that over Samba + Mac OS UTF-8 filenames apparently cannot 
be handled which is causing my issues above since it apparently then uses CP850.

I have no idea if there is any way we can check for this during import or fix 
it somehow but I doubt it, it looks like AFP is exempt from this problem.

Original comment by daniele.sluijters on 31 Jan 2011 at 1:02

GoogleCodeExporter commented 9 years ago
If I understand you correctly, this problem is manifesting only when you try to 
import files that both (a) contain non-ASCII characters and (b) are located on 
a remote SMB share mounted on Mac OS X. Is that correct? If so, this is 
definitely a separate issue from the "undecodable filenames" problem, which 
occurs when the filenames are non-UTF8 even though they purport to be. We 
should open a separate ticket in that case.

It would also be helpful to see the sources you mentioned that talk about UTF-8 
support in Mac OS X's Samba -- then I might be able to see if there's a 
workaround for this limitation. 

Original comment by adrian.sampson on 31 Jan 2011 at 4:51