beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.74k stars 1.82k forks source link

Bug: info operation fails for string containing ã on linux (and other operations such as fingerprint) #3939

Closed sandersantema closed 3 years ago

sandersantema commented 3 years ago

I've got a file with the filename Anthony Parasole at Dekmantel Festival São Paulo 2017-317327953.mp3 this works perfectly fine on macOS but doesn't on linux. I believe this has something to do with linux expecting composed unicode file encodings while macOS can handle both but prefers decomposed i.e. a\xcc\x83 as can be seen below. I believe $LANG and $LC_ALL might be relevant to this issue as well, however these are the same on both my macOS and Linux machines namely en_US.UTF-8, manually testing using env LANG=en_US.UTF-8 causes the same error. I think this is a known issue but I'm a bit in over my head as to how to actually resolve it, if I've missed any info please let me know. The best solution to this might be to simply sanitize all filenames on macOS but I do wonder whether this is a future proof solution given that I'd have to keep on manually sanitizing filenames before importing in the future on Linux.

Some more info I found out later:

Problem

Running this command (verbose mode doesn't provide any more relevant info):

> beet info paulo
info: cannot read file: [Errno 2] No such file or directory: b'/home/sandersantema/Music/iTunes/iTunes Media/Music/Unknown Artist/Unknown Album/Anthony Parasole at Dekmantel Festival Sa\xcc\x83o Paulo 2017-317327953.mp3'

While on macOS the command works just fine:

/Users/sandersantema/Music/iTunes/iTunes Media/Music/Unknown Artist/Unknown Album/Anthony Parasole at Dekmantel Festival São Paulo 2017-317327953.mp3
          art: False
     bitdepth: 0
      bitrate: 128000
     channels: 2
       format: MP3
        genre: Mix
       genres: Mix
       length: 7293.613401360544
rg_track_gain: 1.3
rg_track_peak: 1.0
   samplerate: 44100
        title: Anthony Parasole at Dekmantel Festival São Paulo 2017-317327953

Setup

sandersantema commented 3 years ago

What I should've mentioned is that I've run the following command on the database to ensure paths are correct on linux:

sqlite3 ~/.config/beets/library.db "UPDATE items SET path = replace(path, '/Users/sandersantema', '/home/sandersantema');" 
sandersantema commented 3 years ago

Some more info:

As you can see macOS does indeed seem to prefer the decomposed version while Linux prefers the composed version. When I try to execute beet show using the database with the problematic song imported on Linux it fails on macOS for this newly imported song. So it seems the way songs containing accents and such are stored on macOS and Linux are not compatible.

wisp3rwind commented 3 years ago

Some more info:

* Filename when imported on macOS: `Anthony Parasole at Dekmantel Festival Sa\xcc\x83o Paulo 2017-317327953.mp3`

* Filename when imported on Linux: `Anthony Parasole at Dekmantel Festival S\xc3\xa3o Paulo 2017-317327953.mp3`

As you can see macOS does indeed seem to prefer the decomposed version while Linux prefers the composed version. When I try to execute beet show using the database with the problematic song imported on Linux it fails on macOS for this newly imported song. So it seems the way songs containing accents and such are stored on macOS and Linux are not compatible.

Linux doesn't "prefer" any specific unicode normalization: On Linux filesystems, paths can be any byte string, and do not even need to be valid Unicode. As far as I know, beets also doesn't alter the Unicode representation at all, it just uses whatever it receives from its metadata sources/the filesystem. So the issue might actually be outside of beets. It would be helpful to know more details about what you did: Which filesystem are involved (HFS+ apparently forces (something similar to) NFD: https://en.wikipedia.org/wiki/HFS_Plus)? Is this the same filesystem mounted on linux and Mac, or was the music copied in between? With which tool? Might that have changed the filename encoding?

beet list paulo works fine on both systems

This only operates on the database, not the media files.

after importing the file again on linux (I've copied the library from macOS before and plan to synchronize the databases) works fine, although beets doesn't detect it as the same file.

I'm not quite following what you did here and what your goal is. It would be good to know what you're trying to achieve in the end; it might turn out that it can't really be done with beets, though.

sampsyo commented 3 years ago

Unfortunately, debugging encoding problems can be really, really hard. The thing to know is that, on Linux, your filenames are truly just raw bytes—no encoding is preferred or enforced, as @wisp3rwind mentioned. So beets is attempting to access those files with an exact sequence of bytes. If those don't match, then the file won't be found.

Doing some digging into how you got the "wrong" bytes in your beets database would be useful. In particular, if you imported your files on macOS and then manually modified the database to make it work on Linux, I can see how that would create problems because the two OSes actually use different filenames (i.e., different sequences of bytes that look the same to humans when rendered as Unicode) for the same files.

sandersantema commented 3 years ago

In the end my goals is to use beets as an interface or so to speak glue between my music library and tools on my Linux machine and on my macOS machine. Although I exclusively use the Linux machine for day to day use I'm still stuck with macOS for DJ'ing, my hardware depends on DJ software called Traktor which in turn depends on iTunes. For now I've hooked up iTunes to beets by way of hooks which trigger applescripts. In the end I might do away with iTunes all together in favor of https://github.com/16pierre/traktorBeetsIntegration although I'm not ready for that yet because I still sync music to my iPhone using iCloud Music.

The music files themselves are synced by syncthing. On the Linux machine I use the XFS filesystem on macOS APFS. Do you know of any way

In particular, if you imported your files on macOS and then manually modified the database to make it work on Linux, I can see how that would create problems because the two OSes actually use different filenames

This is exactly what I did.

It seems like this issue might be quite hard to resolve and particular to a use case which might be out of scope for beets, so instead I might simply try and rename all the offending filenames given that I don't actually use those for anything and only identify music by metadata. The challenge would then be to come up with a robust way of renaming any file that might cause trouble.

There might be a possible solution however, although I can't completely assess how good it would be in regards to edge cases and such and it would probably require quite a lot of work. Since the problematic characters both represent the same character, ã ("a" with a tile) a\xcc\x83 and \xc3\xa3 it might be feasible for beets to consider the unicode bytes as synonyms. This would however require that this goes for all problematic characters and there are no ambiguities.

sampsyo commented 3 years ago

One option you might consider would be to use ASCII filenames (the asciify_paths config option), which are less likely to trigger cross-platform encoding problems.

wisp3rwind commented 3 years ago

So you're syncing the music only in one direction, namely Linux -> Mac, and these scripts trigger a library re-scan in iTunes? Then, maybe, your scripts would be able to normalize the paths for APFS before trying to access any files?

It seems like this issue might be quite hard to resolve and particular to a use case which might be out of scope for beets, so instead I might simply try and rename all the offending filenames given that I don't actually use those for anything and only identify music by metadata. The challenge would then be to come up with a robust way of renaming any file that might cause trouble.

Or, much more simply, the asciify/asciify_path options might solve your problem?

sandersantema commented 3 years ago

asciify_paths seems like a great solution! Thanks @sampsyo and @wisp3rwind.

So you're syncing the music only in one direction, namely Linux -> Mac, and these scripts trigger a library re-scan in iTunes? Then, maybe, your scripts would be able to normalize the paths for APFS before trying to access any files?

I think that's what I'd achieve using the asciify_paths option right?

For the interested this is what I'm doing exactly:

hook:
  hooks:
    - event: after_write
      command: osascript /Users/sandersantema/.config/beets/refresh.scpt "{item.path}"
    - event: write
      command: echo "{item.path}"
    - event: item_removed
      command: osascript /Users/sandersantema/.config/beets/remove.scpt "{item.path}"
    - event: item_removed
      command: mv "{item.path}" /Users/sandersantema/.config/beets/trash
    - event: item_moved
      command: echo "{source}" "{destination}"
    - event: item_moved
      command: osascript /Users/sandersantema/.config/beets/move.scpt "{source}" "{destination}"

Scripts: move.scpt remove.scpt refresh.scpt

One nice thing is that as you can see here I don't need a add.scpt because if a song which isn't added to the iTunes library is refreshed such as is done in refresh.scpt it is added to the library. If you'd want to use these scripts together with apple's new Music app simply replace iTunes with Music in every file.

wisp3rwind commented 3 years ago

asciify_paths seems like a great solution! Thanks @sampsyo and @wisp3rwind.

So you're syncing the music only in one direction, namely Linux -> Mac, and these scripts trigger a library re-scan in iTunes? Then, maybe, your scripts would be able to normalize the paths for APFS before trying to access any files?

I think that's what I'd achieve using the asciify_paths option right?

Yes; it might be more drastic (but also simpler and maybe more reliable) than the normalization that you'd minimally need.