Predict amount of work required for a large import

beetbox / beets

music library manager and MusicBrainz tagger

http://beets.io/

MIT License

12.92k stars 1.82k forks source link

Predict amount of work required for a large import #218

Closed arthurlutz closed 10 years ago

arthurlutz commented 11 years ago

Cleaning up a huge mp3 library with beets seems very attractive, but can be discouraging.

It would be nice to have a nothing of the "amount" of work that is required.

This would be an "analyse" (or some other keyword) command that will take a look and print a report (text or html?) of the volume of the collection and of the amount of automatically recognized data.

For example :

38 % of collection has 90% match and will imported automatically
55 % of collection has 2 possible matches
12% of collection needs conversion

Estimated time : if 15 seconds on each user choice : X minutes or Y hours.

Combine this with an incremental approach (oh you've done 67 % of the import!) and a "come back to it later" possibility... that would be awesome.

sampsyo commented 11 years ago

This is an interesting idea and I can see how it might make users more willing to jump into an import.

One central concern, though, is that making this prediction will require a lot of time—not unlike actually doing the import! We'll still need to read tags, hit the MusicBrainz API, and perform the match evaluation for every album, which will take a nontrivial amount of time. And a long time for a large collection.

Since it will take so long, one could argue that you might be better off just kicking off the import and leaving it in a screen session. It will run ahead and look things up while you make coffee; no need to baby-sit the process.

Anyway, I think something that makes the import process reentrant (and therefore gives you a more global view on your progress) would be valuable. But it's a large change from what we have today and can't be implemented trivially as a command on the side. See also the brief discussion of "asynchronous import decisions" on the refactoring page.

nogweii commented 11 years ago

How about a lot of guessing, which could be tracked?

Have a guessing algorithim based on something like:

avg time to hit MB's API + (% likelyhood of a miss in MB's records, requiring user confirmation * 15 seconds for each user confirm) + (2 secs * number of songs)

Other plugins, like acoustid, could optionally add in a 'fuzzy work factor', that increases the estimation.

Perhaps that's enough?

sampsyo commented 11 years ago

Interesting—a heuristic calculation based on just the number of albums? Two concerns there:

It's pretty easy for a user to guess themselves based on ls -l | wc -l.
Taking into account beets' multithreaded tagger, the process of looking up in MusicBrainz overlaps in time with the user confirmation step. This means that, in the ideal case (e.g., if you kick off the import, make coffee, and come back to start making decisions), you're never waiting for MusicBrainz queries. That makes an accurate linear combination hard to conceive of.

sampsyo commented 10 years ago

Closing this ticket for now. We can reopen if there's a more specific (and nontrivial) proposal.