Closed msolli closed 10 years ago
Thanks for the report—that's very mysterious. Is the any chance you can share a copy of the problematic music so we can reproduce it? (If you want to share privately, my email address is on my profile page.)
Thanks, I've emailed you a link to the recording.
Sorry for the delay while I got around to this. I'm able to reproduce this (even without discogs)—candidate matches get slower and slower and the process' CPU usage creeps up to 100%. Not sure what's going on here; some closer investigation is in order.
Just an idea: There is a 80 CD box set with a couple of hundred tracks, including the Goldberg Variations. May this brings down the bipartite matching?
Yep, the bottleneck I'm able to reproduce is indeed on that very large release. But the slow part seems to be the all-pairs distance calculation loop (it took a few minutes to get through the distance calculation and then a few seconds to run the bipartite matching algorithm). This 150-CD collection does even worse.
I'm going to try to profile one of these to determine why the track_distance
calls are so expensive.
After a bit of profiling, I found some egregious waste of work in the track-matching machinery. See the above commit if you're interested in the details, but the high-order bit is that 80% of an album match's time was being wasted on reloading match weights from the configuration. That's fixed now; an overall match operation on the 80-CD box set was reduced in my benchmark from 80 seconds to 17 seconds.
This should make things much, much faster but not instant. A match on the larger 150-CD collection is still going to take a while (my benchmark says about 100 seconds), but hopefully this is not interminable on your machine, @msolli.
The next bottleneck is a the string distance calculation. On that larger benchmark, 70% of the matching time is now spent in string_dist
. (FWIW, 15% is spent in the bipartite matching algorithm.) I'm going to open a separate issue about looking into faster Levenshtein implementations.
I'm importing a large-ish collection of music into beets. The import seems to hang on one particular recording. I've tried several times, and waited for up to three hours. It always hangs at the same spot. The process won't terminate with CTRL-C, I have to
kill -9
it.Here's the output when I try to import only the problematic recording:
I'm running beets 1.3.3 on OS X 10.9.2 with Python 2.7.6 from Homebrew.
Here's my config:
Please let me know if there's any more info I could provide.