beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.73k stars 1.82k forks source link

Sanitize colons in drive-letter-like positions in Windows filenames like path separators #3685

Closed gschmidl closed 4 years ago

gschmidl commented 4 years ago

I have an album containing an artist named "F:A.R.". When doing any activity that involves moving this artist, the file is not moved to "M:\F:A.R." (I'm using unicode replacements instead of underscores), but to "M:\F:\A.R.".

Problem

Running this command in verbose (-vv) mode:

~ ❯ beet -vv move "artist::^F.A.R"
user configuration: C:\Users\Gunther Schmidl\AppData\Roaming\beets\config.yaml
data directory: C:\Users\Gunther Schmidl\AppData\Roaming\beets
plugin paths:
Sending event: pluginload
lyrics: Disabling google source: no API key configured.
library database: M:\lib.db
library directory: M:\
Sending event: library_opened
Moving 1 item.
moving: M:\F꞉A․R․ / Muslimgauze\[1990] Manipulation (Live Extracts) / Death of Saint Jarnaii Singh Bhindranwaie\01-01 - Manipulation (Live Extracts).mp3
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: cli_exit

Led to this problem:

Here's what it looks like in the database: https://i.imgur.com/Z5lvGqE.png

Here's a link to the music files that trigger the bug (if relevant):

https://www.dropbox.com/s/pastihu349z6kce/01-01%20-%20Manipulation%20%28Live%20Extracts%29.mp3?dl=0

Setup

My configuration (output of beet config) is:

lyrics:
    bing_lang_from: []
    auto: yes
    bing_client_secret: REDACTED
    bing_lang_to:
    google_API_key: REDACTED
    google_engine_ID: REDACTED
    genius_api_key: REDACTED
    fallback:
    force: no
    local: no
    sources:
    - google
    - lyricwiki
    - musixmatch
    - genius
directory: M:/
library: M:/lib.db

paths:
    default: $albumartist/[$year] $album%aunique{}/$disc-$track - $title
    singleton: Non-Album/$artist/$title
    comp: Compilations/[$year] $album%aunique{}/$disc-$track - $title

plugins: web mbsync discogs chroma lastgenre duplicates lyrics
lastgenre:
    source: artist
    force: no
    whitelist: yes
    min_weight: 10
    count: 1
    fallback:
    canonical: no
    auto: yes
    separator: ', '
    prefer_specific: no

replace:
    \\: "\uFF3C"
    /: "\uFF0F"
    \.{3}: "\u2026"
    \.: "\u2024"
    '[\x00-\x1f]': ''
    <: "\uFE64"
    '>': "\uFE65"
    ':': "\uA789"
    '"': "\uFF02"
    \?: "\uFF1F"
    \*: "\u204E"
    \|: "\u2502"
    \s+$: ''
    ^\s+: ''
    ^-: "\u2012"
path_sep_replace: "\uFF0F"
discogs:
    apikey: REDACTED
    apisecret: REDACTED
    tokenfile: discogs_token.json
    source_weight: 0.5
    user_token: REDACTED
    separator: ', '
    index_tracks: no
web:
    host: 127.0.0.1
    port: 8337
    cors: ''
    cors_supports_credentials: no
    reverse_proxy: no
    include_paths: no
duplicates:
    album: no
    checksum: ''
    copy: ''
    count: no
    delete: no
    format: ''
    full: no
    keys: []
    merge: no
    move: ''
    path: no
    tiebreak: {}
    strict: no
    tag: ''
chroma:
    auto: yes
gschmidl commented 4 years ago

Speculation: I assume beets interprets F: as a drive and adds a \ automatically, since it didn't happen with any other colon in a library of ~16000 files.

sampsyo commented 4 years ago

Huh; that's interesting! Any chance you could try this with the default (underscore-laden) replace configuration? I wonder if anything weird is happening with those Unicode replacements.

gschmidl commented 4 years ago

Huh; that's interesting! Any chance you could try this with the default (underscore-laden) replace configuration? I wonder if anything weird is happening with those Unicode replacements.

Same result, but now the directory is named F_. I removed the entire "replace" and path_sep_replace blocks as well as the plugins again.

sampsyo commented 4 years ago

Very interesting; thanks! It seems like something real is going on here, but it's not entirely obvious where the root cause is. We must be, implicitly or explicitly, doing an os.path.split on the completed path filename (not the template), which has this effect:

>>> import ntpath
>>> ntpath.split('A:B:C')
('A:', 'B:C')
>>> ntpath.split('AA:B:C')
('', 'AA:B:C')

(It's only a problem when it's a single letter and at the very beginning of the path.)

I believe the most likely candidate is in our path sanitation routines, where we need to split apart the path, do replacements on the components individually, and then glue it back together: https://github.com/beetbox/beets/blob/a2f66a042727964e18d698a7b1676a802b5f7050/beets/util/__init__.py#L598

So if we fill in the template with actual : character and then try to do this, we will split in the wrong place!

This is the same reason we have a special fixed-function replacement for / (and \ on Windows) that runs before all this, so we know where to split components: https://github.com/beetbox/beets/blob/a2f66a042727964e18d698a7b1676a802b5f7050/beets/dbcore/db.py#L86

(See also #323.) It seems like we likely need to do a similar thing for : on Windows.