No time limit on candidate lookup

RollingStar commented 7 years ago

Problem

If a candidate is a huge box set, beets slows to an absolute crawl trying to match tracks to it. There should be a cap on maximum lookup time, and beets should cancel a lookup thread that goes passed that time limit. Then it can provide the user with its candidates, including the interrupted lookups, and give its usual prompts, like (A)pply, but add a new one: something like (C)ontinue interrupted lookups. The time limit could default to something reasonable (10-60 seconds) and perhaps be user-specified in the config.

The import hangs (gets really slow but eventually finishes) on two attempts for the same files. Similar lookups took at least 25 minutes, but I haven't timed it exactly. edit: I started this command at least before posting this issue 1 hour and 45 minutes ago. Beets is still considering the entire boxset as a candidate and Python still shows CPU usage.

Running this command in verbose (-vv) mode:

beet -vv import e:\music\mozart
user configuration: C:\Users\RollingStar\AppData\Roaming\beets\config.yaml
data directory: C:\Users\RollingStar\AppData\Roaming\beets
plugin paths:
Sending event: pluginload
inline: adding item field disc_layer
inline: adding album field album_bitrate_kbps
library database: e:\songs\beetslibrary.bib
library directory: e:\songs
Sending event: library_opened
Sending event: import_begin
Import of the directory:
e:\music\mozart
was interrupted. Resume (Y/n)?
Resuming interrupted import of e:\use
Skipping previously-imported path: [omitted]
Sending event: import_task_created
Sending event: import_task_start
Looking up: e:\music\mozart\Volume 6 - Dances & Marches
Tagging Mozart, Wolfgang Amadeus - Tänze · Märsche (CD 1) (Complete Mozart Edition Vol.6)
No album ID found.
Search terms: Mozart, Wolfgang Amadeus - Tänze · Märsche (CD 1) (Complete Mozart Edition Vol.6)
Album might be VA: False
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition, Volume 6: Dances / Marches
Success. Distance: 0.25
Sending event: albuminfo_received
Candidate: Judas Priest - The Complete Albums Collection
Success. Distance: 0.79
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition, Volume 1: Early Symphonies
Success. Distance: 0.73
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition
[here is where it's stuck for many minutes, although candidate 1 (correct) and 3 are also slow]

Led to this problem:

beet -vv import e:\music\mozart

Here's a link to the music files that trigger the bug (if relevant): n/a, but this dummy file that matches the box set probably simulates the same effect. dummy - 01 - Mozart, Wolfgang Amadeus - 7 Menuette, KV 61b (65a) - 1. Menuett in G-dur.zip

Setup

OS: Win10 x64
Python version: 3.6.0
beets version: 1.4.3
Turning off plugins made problem go away (yes/no): no (can disable my last two plugins, but the log shows that they should be unrelated). I have already disabled most of my plugins and trimmed my config down to the bare essentials.

My configuration (output of beet config) is:

directory: e:\songs\
library: e:\songs\beetslibrary.bib
import:
    copy: no
    move: yes
    write: yes
    timid: no
per_disc_numbering: yes

plugins: inline copyartifactspy3

paths:
    default: $albumartist - $original_year - $album [$album_bitrate_kbps]/%if{$disc_layer,$disc_layer}$track. $title 
album_fields:
    album_bitrate_kbps: |
        #dummied out my custom album bitrate function
        #this dummy function should have almost 0 performance penalty
        return(str(abcdefghi))
item_fields:
    disc_layer: |
        if disctotal > 1:
           return str(disc) + '-'
        else:
           return ''

sampsyo commented 7 years ago

Hi! Thanks for the details on this.

Can you tell whether beets is stuck communicating with MusicBrainz to get matches or doing the matching? I've found in the past that the matching algorithm itself can be the culprit in these situations with very large albums. Specifically, it seems from the log like it's getting stuck when it tries to match with a very large candidate album.

This might be testable, for example, by giving beets an MBID to search for and comparing the time to match a few different options in isolation.

RollingStar commented 7 years ago

I gave beets a MBID for the correct match (volume 6 of the set rather than the entire set) and the whole operation was quite manageable. Probably under 5 minutes, compared to the 1h45+ which I eventually cancelled.

How specifically do you propose I test the performance? I think this is the MBID that beets was hanging on, but I'm not sure. 03ce3f61-9294-4f5d-aace-a4687aad4229

edit: and keep in mind, my point for this issue is that no matter how optimized beets gets, there should be a time limit, because it is a more graceful handling of unpredictable edge cases.

sampsyo commented 7 years ago

Thanks for looking into it. And also for including the "dummy" file above, which does match the same albums.

Oddly, on my machine (an aging MacBook Pro), this only takes a few seconds per match. It's not exactly snappy, but it's also not in the tens of minutes. It seems like about half the time is being spent in track matching (i.e., on the CPU) and the other half is just downloading data from MusicBrainz.

So, let's try to narrow down what's taking so long on your machine. I just pushed a bunch of extra logging that should delimit when beets is trying to match metadata and when it's communicating with MB. It will be helpful to sort that out regardless of whether we add a timeout or just optimize that chunk of code so it can't take so long. Can you give the most recent source a try in verbose mode and report back about which phase takes the longest?

dosoe commented 7 years ago

Hi! I'm having a similar problem with this release: https://musicbrainz.org/release/9bcd75dd-995e-482b-8ba7-1ef074d253de However, with very big releases (like a complete edition of bach) I don't have problems since I have only a fraction of the files. Complete releases with ~15 CDs like https://musicbrainz.org/release/b0326242-344d-4a07-a175-c36a1418b644 are handled without problems, even if all files are there. Also, when some files are double and some are lacking, beets is trying to put them in place of the missing files. Maybe there should be a check if a file is actually a duplicate. I would think that beets is trying all different configurations of files to check which one has the best fit. If you have 700 or 1000 files that you can combine, it makes a lot of combinations to try out. If it's not done already, it would make sense to check for a tag like 'tracknumber' to reduce the number of possible combinations.

The timeout is not implemented so far, I got beets searching for a few hours to match an album.

dosoe commented 7 years ago

If the problem is the handling and comparison of too many candidates, could there be a way to give the ID on MusicBrainz as an input to get rid of the problem? Or set a timeout and after it has expired, prompt for an ID.

sampsyo commented 7 years ago

It's actually the time taken to evaluate a single candidate album, when that candidate itself is very large.

dosoe commented 7 years ago

Still, if we already tell him: "this is the right release, now you have only to get the tracks at the right place", could that speed it up or is it exactly that step that is taking time?

sampsyo commented 7 years ago

Yes, disabling the track alignment procedure could work.

RollingStar commented 7 years ago

I still think a time limit (perhaps with a user prompt to keep looking) would help retain user-friendliness for unpredictable edge cases. This could be done as the only fix, or in addition to any other fixes.

dosoe commented 7 years ago

I have the same problem with a multi-disc release. I ended up importing all the CDs individually, but now he considers that he has the same release several times, one for every import. Is there a possibility to merge them, so that he only has one release that contains all the CDs?

sampsyo commented 7 years ago

The most relevant issue about merging albums is https://github.com/beetbox/beets/issues/112.

boardfish commented 6 years ago

Currently dealing with this issue for d51f64e5-1d52-4d40-8308-5c9f22c4af03.

8h2a commented 5 years ago

I'm having the same(?) issue with these releases: https://musicbrainz.org/series/aa610319-3b4a-4a26-a127-b5dc78c3b732 Beets just seems to be doing something forever. Any bigger release (e.g. dozens of discs, or hundreds of tracks) seems to be impossible to import using beets. Another such example would be this: https://musicbrainz.org/release-group/9d0636f7-e959-4a00-9787-9a23c07f9655

KW4NP commented 4 years ago

We're still dealing with this issue, an album of 260 tracks took over an hour before beets finally realized that there was no match. Would be very nice to have at least a timeout to keep this from happening needlessly.

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RollingStar commented 4 years ago

I still think a time limit (perhaps with a user prompt to keep looking) would help retain user-friendliness for unpredictable edge cases. This could be done as the only fix, or in addition to any other fixes.

The "what is blocking it" line works better on a huge project than this one. There aren't many devs, and even fewer devs know beets in detail.

jtpavlock commented 4 years ago

The purpose of that line, in the case of these issues, is to hopefully realize what is preventing the issue from being labeled as a feature or bug and provide the necessary info to "upgrade" it from needinfo status. So in the case of this issue, perhaps one blocking element is the request from @sampsyo:

So, let's try to narrow down what's taking so long on your machine. I just pushed a bunch of extra logging that should delimit when beets is trying to match metadata and when it's communicating with MB. It will be helpful to sort that out regardless of whether we add a timeout or just optimize that chunk of code so it can't take so long. Can you give the most recent source a try in verbose mode and report back about which phase takes the longest?

@RollingStar were you ever able to get around to this?

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

KW4NP commented 4 years ago

I'm guessing this is still relevant as it wasn't ever fixed as far as I know.

On Thu, Sep 10, 2020, 1:51 PM stale[bot] notifications@github.com wrote:

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2446#issuecomment-690575418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIQYQZKGXILQ6CYN5K7GKXDSFEG33ANCNFSM4DAWP4LA .

RollingStar commented 4 years ago

Part of why I haven't gone back and try to reproduce this specific bug with the massive Mozart box set is it would be plugging one potential bug. Enforcing a time-out would stop a whole class of odd behavior. Another reason is I eventually caved and imported the box set as sub-sets.

sampsyo commented 4 years ago

Would anybody who's affected by this issue be interested in doing the aforementioned digging to figure out the root cause (cf https://github.com/beetbox/beets/issues/2446#issuecomment-280982403, for example)?

dosoe commented 4 years ago

Hi! I have a few releases that fall in the category 'too big to handle', but I'm a bit occupied right now.

Le jeu. 10 sept. 2020 à 23:52, Adrian Sampson notifications@github.com a écrit :

Would anybody who's affected by this issue be interested in doing the aforementioned digging to figure out the root cause (cf #2446 (comment) https://github.com/beetbox/beets/issues/2446#issuecomment-280982403, for example)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2446#issuecomment-690751530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGIAODCD4DLAIL5VFO7FU3SFFDBXANCNFSM4DAWP4LA .

stale[bot] commented 3 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

beetbox / beets

No time limit on candidate lookup #2446

Problem

Setup