beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.96k stars 1.82k forks source link

Moved files for merged imports lead to crashes in chroma plugin #4342

Open hamaryns opened 2 years ago

hamaryns commented 2 years ago

OK, so this time I really think I spotted a problem. I ripped a CD, but since it didn’t work properly, it ended up in 2 directories. I then imported those directories with beets. The first one imported fine, with of course the mention that some tracks were missing. Then it proceeded to the second directory, and again found the album and mentioned tracks were failing, but I chose Apply, and then… BOOM.

Problem

Running this command in verbose (-vv) mode:

$ beet -vv import -P /media/hendrik2/Scharnier/Muziek\ te\ sorteren/ 2> beet2.log

/media/hendrik2/Scharnier/Muziek te sorteren/Cobla Amsterdam (11 items)
Tagging:
    Cobla Amsterdam - Live in Catalunya
URL:
    https://musicbrainz.org/release/d09e68bd-04b7-4e5b-bbfa-854868467486
(Similarity: 90.4%) (missing tracks) (CD, 2008, NL, Etcetera, KTC 1375)
Missing tracks (4/15 - 26.7%):
 ! Ricard Viladesau – Barcelona ’92 (obligada 2 trumpets) (# 7) (5:06)
 ! Francesc Riumalló – Oncle i nebot (obligada tenora)    (# 9) (4:48)
 ! Max Havart – Rouranenca (obligada tenora)              (#11) (4:06)
 ! Emili Saló – El Bruel de Pals (obligada 2 fiscorns)    (#12) (4:44)
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? 

/media/hendrik2/Scharnier/Muziek te sorteren/Cobla La Principal d’Amsterdam (4 items)
Finding tags for album "Cobla La Principal d’Amsterdam - Live in Catalunya".
Candidates:
1. Cobla Amsterdam - Live in Catalunya (36.3%) (missing tracks, tracks, artist, ...) (CD, 2008, NL, Etcetera, KTC 1375)
2. Cobla Maravella - Catalunya Dança (19.6%) (missing tracks, tracks, album, ...) (12" Vinyl, 1964, ES, Zafiro, Z-M 10)
3. Cobla Maravella Dtor. Lluís Ferrer i Puigdemont - Catalunya Canta (19.4%) (missing tracks, tracks, album, ...) (12" Vinyl, 1968, ES, Canigó, CANI-4-S)
4. Cobla "La Selvatana" - Sardanas A Salvador Dalí (17.0%) (tracks, album, artist, ...) (7" Vinyl, 1964, ES, Sandiego, SAN-111)
5. Jaume Pahissa, Miguel Llobet Solés, Gaspar Cassadó, Manuel de Falla; Giacomo Palazzesi - Catalunya (16.7%) (missing tracks, tracks, artist, ...) (Digital Media, 2016, XW, GuitArt Collection)
# selection (default 1), Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? 
Correcting tags from:
    Cobla La Principal d’Amsterdam - Live in Catalunya
To:
    Cobla Amsterdam - Live in Catalunya
URL:
    https://musicbrainz.org/release/d09e68bd-04b7-4e5b-bbfa-854868467486
(Similarity: 36.3%) (missing tracks, tracks, artist, year) (CD, 2008, NL, Etcetera, KTC 1375)
 * Track 7  -> Ricard Viladesau – Barcelona ’92 (obligada 2 trumpets) (title)
 * Track 9  -> Francesc Riumalló – Oncle i nebot (obligada tenora) (title)
 * Track 11 -> Max Havart – Rouranenca (obligada tenora) (title)
 * Track 12 -> Emili Saló – El Bruel de Pals (obligada 2 fiscorns) (title)
Missing tracks (11/15 - 73.3%):
 ! Vicens Bou – Girona aimada!                   (# 1) (4:19)
 ! Ricard Castells – Els Degotalls               (# 2) (3:38)
 ! Conrad Saló – La Feste de Sant Martirià       (# 3) (4:51)
 ! Josep Saderra – Mercé                         (# 4) (4:26)
 ! Manel S. Puigferrer – Ceretana                (# 5) (4:27)
 ! Pere Mercader – Conxita encisera              (# 6) (4:41)
 ! Jaume Roca Delpech – Gavina                   (# 8) (4:22)
 ! Joaquim Serra – Sabadell                      (#10) (4:33)
 ! Robert Sarrade – En Guii de Sant Cebrià       (#13) (4:55)
 ! Marcel Artiaga – Els milors anys              (#14) (4:07)
 ! Tomàs Gil Membrado – 25 anys de Baix Montseny (#15) (4:56)
Apply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? a
Old: 11 items, FLAC, 725kbps, 44.1kHz/16 bit, 49:21, 256.4 MiB
New: 4 items, FLAC, 715kbps, 44.1kHz/16 bit, 18:46, 95.9 MiB
[S]kip new, Keep all, Remove old, Merge all? m

Led to this problem:

(paste here)

beet2.log

Setup

OS: Linux Mint Mate edition 20.3 64 bit Python version: $ python2 --version Python 2.7.18 beets version: $ beet --version beets version 1.6.0 Python version 3.8.10 plugins: chroma, fetchart, fromfilename, lastgenre (different python versions, that’s interesting)* Turning off plugins made problem go away (yes/no): How can I test this? The first folder is imported and moved, the second is still in the to be imported folder.

My configuration (output of beet config) is:

$ beet config
directory: /media/hendrik2/Scharnier/Muziek
library: ~/.beets/musiclibrary.db

import:
    move: yes
log: beetslog.txt
art_filename: "$albumartist \u2013 $album"

plugins: fetchart lastgenre chroma fromfilename mbsync

paths:
    default: "Albums/$albumartist/$album%aunique{}/$artist \u2013 $title"
    singleton: "Losse nummers/$artist/$artist \u2013 $title"
    comp: "Compilaties/$album%aunique{}/$artist \u2013 $title"
acoustid:
    apikey: REDACTED
chroma:
    auto: yes
fetchart:
    auto: yes
    minwidth: 0
    maxwidth: 0
    quality: 0
    max_filesize: 0
    enforce_ratio: no
    cautious: no
    cover_names:
    - cover
    - front
    - art
    - album
    - folder
    sources:
    - filesystem
    - coverart
    - itunes
    - amazon
    - albumart
    google_key: REDACTED
    google_engine: 001442825323518660753:hrh5ch1gjzm
    fanarttv_key: REDACTED
    lastfm_key: REDACTED
    store_source: no
    high_resolution: no
    deinterlace: no
    cover_format:
lastgenre:
    whitelist: yes
    min_weight: 10
    count: 1
    fallback:
    canonical: no
    source: album
    force: yes
    auto: yes
    separator: ', '
    prefer_specific: no
    title_case: yes
wisp3rwind commented 2 years ago

Looks the same or at least very similar to #4283. This might suggest that there's some issue with the logic behind our merge logic. Maybe there's an issue with the already_merged check? That happens in the user_query stage, which would explain why this blows up in the chroma plugin. Anyway, I'm not sufficiently familiar with this part of the code to investigate this quickly.

sampsyo commented 2 years ago

Ah, this does indeed look like a problem… for completeness, here's an inline paste of the traceback from the attached log:

Traceback (most recent call last):
  File "/home/hendrik2/.local/bin/beet", line 8, in <module>
    sys.exit(main())
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/__init__.py", line 1285, in main
    _raw_main(args)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/__init__.py", line 1272, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/commands.py", line 973, in import_func
    import_files(lib, paths, query)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/commands.py", line 943, in import_files
    session.run()
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/importer.py", line 340, in run
    pl.run_parallel(QUEUE_SIZE)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 446, in run_parallel
    raise exc_info[1].with_traceback(exc_info[2])
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 314, in run
    for msg in _allmsgs(out):
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 467, in pull
    out = coro.send(msg)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 193, in coro
    func(*(args + (task,)))
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/importer.py", line 1369, in lookup_candidates
    plugins.send('import_task_start', session=session, task=task)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/plugins.py", line 488, in send
    result = handler(**arguments)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/plugins.py", line 145, in wrapper
    return func(*args, **kwargs)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 180, in fingerprint_task
    return fingerprint_task(self._log, task, session)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 249, in fingerprint_task
    acoustid_match(log, item.path)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 89, in acoustid_match
    duration, fp = acoustid.fingerprint_file(util.syspath(path))
  File "/home/hendrik2/.local/lib/python3.8/site-packages/acoustid.py", line 340, in fingerprint_file
    return _fingerprint_file_audioread(path, maxlength)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/acoustid.py", line 280, in _fingerprint_file_audioread
    with audioread.audio_open(path) as f:
  File "/home/hendrik2/.local/lib/python3.8/site-packages/audioread/__init__.py", line 111, in audio_open
    return BackendClass(path)
  File "/home/hendrik2/.local/lib/python3.8/site-packages/audioread/rawread.py", line 62, in __init__
    self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: b'/media/hendrik2/Scharnier/Muziek te sorteren/Cobla Amsterdam/Cobla Amsterdam - Vicens Bou \xe2\x80\x93 Girona aimada!.flac'

So the problem here actually manifests within the chroma plugin's hook. It can't find a file from the first directory (Cobla Amsterdam) in the source position (i.e., before the files are moved). Just guessing here, but I suspect the problem arises from a combination of all three of these factors:

Sketching it out, what happens is that:

  1. We fingerprint the first "album." Then, the first "album" is imported and begins being moved to the destination.
  2. Concurrently, the second "album" is fingerprinted and imported, and we detect the duplicate.
  3. Beets recreates a combined album with the merged tracks from the two "albums" and inserts it back into the import pipeline.
  4. At this point, chroma attempts to fingerprint the entire combined album again. However, the combined album still hasn't had its paths updated to reflect the fact that the album has moved.

If we were copying files (instead of moving), this wouldn't come up because we would just re-fingerprint the source files again.

It's a little tricky to think of the right solution for this problem! Some ideas:

wisp3rwind commented 2 years ago

Thanks for delving deeper into this issue! Your analysis sounds a lot like the hunch that I had.

It's a little tricky to think of the right solution for this problem! Some ideas:

  • The answer may be that we need to make sure that the "merged" album that we create from the two parts has the new, moved paths. But that would require "stalling" until the move is complete—that is, if the files are still being moved, we would have the opposite problem where we might try to fingerprint the "destination" paths before the files arrive there.

I'd argue that this is the right approach, and maybe it should even be stricter: It might be safest to block the merged import task until the original tasks have completely left the pipeline (which in fact is very similar to waiting for file moves, since the manipulate_files stage runs late during the import). (Do take my thoughts here with a grain of salt: I still don't fully understand the merge process.)

sampsyo commented 2 years ago

Yeah, I think that may indeed be the simplest policy. The opposite extreme, which you can imagine wanting instead, would be to prevent anything from happening to either of the constituent partial albums—to make it as if the two were originally imported together. However, this policy is impossible to enforce: for one, it's not always the case that the two partial albums are both "in flight"; one might be already imported long ago and just sitting in the database. Or it might be part of the current import but long enough ago that it's impossible to "back in time" to prevent importing.

So waiting for both imports to be completely done before attempting to re-import the pair together seems like the only reasonable clean policy. Doing this is itself not exactly easy—we don't currently have a way to see the status of a previous in-flight import task. So it will take some creativity to sort out the synchronization.