beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.78k stars 1.82k forks source link

The chroma plugin can slow down the import process considerably #360

Closed mineo closed 9 years ago

mineo commented 11 years ago

Importing FLAC files for https://musicbrainz.org/release/a37e4f9a-b437-45fd-aa54-f030c36fec91 (already tagged with beets, deleted and then reimported - I know this is not necessary :P) goes from under to minutes to at least 15.

beet -v imp $thefolder shows

[...]
Sending event: import_task_start
[lots of "chroma: fingerprinted", for some files no recording can be found from acoustid alone, this finishes after about a minute]
Tagging John Williams - The Music of John Williams: The Definitive Collection
Searching for discovered album ID: a37e4f9a-b437-45fd-aa54-f030c36fec91
Candidate: John Williams - The Music of John Williams: The Definitive Collection
Success. Distance: 0.135883 [ this is 0.001813 and recommendation.strong without the chroma plugin]
Album ID match recommendation is recommendation.medium
Search terms: John Williams - The Music of John Williams: The Definitive Collection
Album might be VA: True [another 2 minutes]
acoustid album candidates: 0
Evaluating 10 candidates.
Candidate: John Williams - The Music of John Williams: The Definitive Collection
Duplicate.
Candidate: John Mammoser - John Mammoser's First Comedy CD
Success. Distance: 0.925176
Candidate: Leoš Janáček - A Certain Collection of the Better Performances
Success. Distance: 0.870242
Candidate: Bing Crosby - The Centenary Collection
Success. Distance: 0.873714
Candidate: Tom Jones - The Ultimate Collection
Success. Distance: 0.871963
Candidate: John Williams - The Music of John Williams: The Definitive Collection
Duplicate.
Candidate: Various Artists - The Mix of the Century
Success. Distance: 0.913465
Candidate: Various Artists - 538 Dance Smash: Hits of the Year 2012
Success. Distance: 0.871623
Candidate: Various Artists - Beats from the East
Success. Distance: 0.894059
Candidate: Various Artists - The Sixties Box: Mastermix
Success. Distance: 0.879496

It seems like beets is fingerprinting all the things (insert meme-picture here), finds the match not good enough and then skips trying to find MBIDs that might already be in the files ( I think?) and then goes on to load 10 more huge releases.

(I haven't tried clearing the MBIDs from the files and then importing the release, importing that whole thing a few times trying to figure out what's taking so long is enough for today)

I'm using the beets-git AUR package, the latest commit I have is bb191e7

sampsyo commented 11 years ago

Yes, fingerprinting takes a lot of time -- the docs even warn as much. We fingerprint even when metadata is present because fingerprinting is only useful when you trust it over metadata that's already present.

If you want to quickly import a bunch of music that already has good metadata, I'd recommend either turning off fingerprinting or, for a very fast import, turning off autotagging altogether (beet import -A).

mineo commented 11 years ago

But the fingerprinting is not what's taking so long, it's loading the other releases that are retrieved by a simple search from MB that's taking so long.

sampsyo commented 11 years ago

Ah, I see. So the problem is that beets is fetching lots of obviously-wrong releases (like "John Mammoser's First Comedy CD") only when using chroma? Sorry, I didn't get that before.

We should do some digging into why that's happening -- I so immediate reason why it should.

mineo commented 11 years ago

Well, yes - the (from my understanding) process of importing without chromaprint is:

  1. look at the existing metadata to find MBIDs, and try to find a match from that - if that succeeds, use that
  2. use the webservice search to find possible releases matching the already existing metadata (that does not contain MBIDs)

With chromaprint there's a step 0:

  1. Fingerprint all the files and try to find album matches there.

This step doesn't seem to be successful with the album I tried

What I'm observing is that with chromaprint either step 1 is skipped or (for some reason) the MBID beets discovers (I think that's what Searching for discovered album ID: a37e4f9a-b437-45fd-aa54-f030c36fec91 is about, right?) does not lead beets into chosing the release with that ID and skipping step 2. I think that's the

Success. Distance: 0.135883 [ this is 0.001813 and recommendation.strong without the chroma plugin] Album ID match recommendation is recommendation.medium

part of the log.

After that not-successful-enough step 1 beets then continues to ask the webservice for possible matches ( from http://musicbrainz.org/ws/2/release/?query=release%3A%28the+music+of+john+williams\%3A+the+definitive+collection%29+tracks%3A%2888%29+artist%3A%28john+williams%29&limit=5 and http://musicbrainz.org/ws/2/release/?query=release%3A%28the+music+of+john+williams\%3A+the+definitive+collection%29+tracks%3A%2888%29+arid%3A%2889ad4ac3\-39f7\-470e\-963a\-56509c546377%29&limit=5 ) which is taking so long.

sampsyo commented 11 years ago

I see! Thanks for digging a little deeper.

Yes, it seems that the fingerprinting distance component (that which helps prefer releases that match the fingerprints) is kicking in for the MBID-based match, and in this case pushing it over the edge so that it's no longer a good-enough match. So, for some reason, the plugin thinks that the right album match is wrong because it doesn't match the fingerprints.

I can see two possibilities:

Can you include the snipped portion of the log where beets reports the matches it got from Acoustid?

(Thanks for bearing with me -- this is a complicated one. :fish:)

mineo commented 11 years ago

Unfortunately there's no log because I tee'd the output of beets into a file but only stdout before continuing with the album and submitting fingerprints, so now there's probably no way to reproduce it :(

Unfortunately I also can't rule out any of the two possibilities - as I already wrote in the first post, there were definitely some files where the fingerprint was not associated with any recording and with the album being a compilation it's entirely possible that some fingerprints resolved to a recording not on the release.

sampsyo commented 11 years ago

Okay, cool. I'll look at the code a little to see if anything is obviously wrong and, meanwhile, we can wait and see if this situation comes up again (for you or anyone else).

sampsyo commented 9 years ago

Any chance this was related to what we fixed for #1068? We fixed a (potentially very large) slowdown by limiting the MusicBrainz lookups that stem from chroma. Let us know if this is not any better in 1.3.9, @mineo.

mineo commented 9 years ago

I'm not using beets anymore to organize my music library (it somehow never really clicked for me, but reading the changelogs make me jealous all the time :D). I'll probably give it another shot at some point and will reopen this if I can reproduce this.