beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.78k stars 1.82k forks source link

import behaviour assumes perfect mb recording match (particularly bad for acoustid submissions) #667

Open johtso opened 10 years ago

johtso commented 10 years ago

So, I did something rather silly the other day. I'd imported a fair amount of music using beets, and used the chroma plugin to submit the fingerprints to acoustid.

It then only today hit me that I hadn't carefully checked that the matches were for the correct releases/mediums.

That means that I most likely made a number of erroneous submissions.

The fact that the default behaviour is to not be timid when importing combined with the fact that users might not realise the importance of only submitting fingerprints when you are sure you have matched the audio files to precisely the correct musicbrainz release seems like a recipe for bad data.

One simple improvement would be to add some kind of warning when doing a submission. This warning should probably also be added to the documentation.

_On a related more general note.._

I was having a think about what the optimum workflow might be for ensuring that you only submit fingerprints for things in your library for which you have a high confidence of the metadata being accurate, and the musicbrainz release id being correct. (This doesn't actually just apply to acoustid submissions, it applies more generally to keeping track of which parts of your library are rougher than others)

Some imported music may have unclear provenance, but you still want to improve its tags based on a "good enough" match. The way things stand, the tags will be updated with musicbrainz metadata pointing to a specific release, and there is no way to distinguish between those vague matches in your collection, and the things that you have carefully ensured to be perfect matches.

One solution could be to allow making a distinction at the point of importing. An "Apply this and mark with high confidence" action maybe. Or maybe even "Apply only non release specific tags", which wouldn't add musicbrainz IDs at all.

You could then do a beet submit confidence:high (or if going the route of only adding non release specific tags when uncertain, just beet submit), and avoid sending dodgy data.

Sorry if this is a bit of a stream of consciousness :)

sampsyo commented 10 years ago

A documentation note seems like a great idea—want to add the relevant paragraph? We could consider an inline warning too, but perhaps the docs will be enough.

One other idea for assessing confidence: we could record the distance score of the appropriate match in a flexible attribute on import. Then you could call beet submit import_score:0.9.. or along those lines to filter out questionable matches without requiring human intervention.

johtso commented 10 years ago

Hmm, recording the distance score might be interesting, but it doesn't really fill the need of recording how confident you, the importer are about the match.

Very often only you can verify an accurate musicbrainz release match if there is any ambiguity at all. Is the matched MB release vinyl when it should digital? Did it match against the wrong release amongst 10 releases of an album with identical track listings?

What do you think of the idea of just making the writing of release specific tags optional?

So if you want to improve the tags, but don't want to mess around finding the correct release on musicbrainz (or creating it if it doesn't exist), you can apply the metadata of the matched release, but not apply the release specific tags.

Release specific tags maybe being classified as:

With this behaviour, if you were to submit your entire collection's acoustids, submitting ids for things that you hadn't carefully matched to the appropriate recording would be harmless (and even helpful I suppose), as just the release agnostic information would be submitted with the fingerprints (https://github.com/sampsyo/beets/blob/master/beetsplug/chroma.py#L219-L227), and not misleading recording specific ids.

This could possibly even be implemented in a very general way, maybe as a plugin that allows you to optionally blacklist some subset of tags when importing a specific item.

The main point behind all this is that adding the correct release specific musicbrainz ids to the tags of a file is an important thing if you care about accuracy (or want to do things like contributing acoustids), and it's not an easy to get right. Musicbrainz data is very finely grained, and very specific. It's almost certainly the case that most people wont bother to make absolutely sure that they are applying the correct release's information to an item, as long as it's pretty much correct in terms of the information they care about like track names. But it should be possible to avoid applying misleading, incorrect, release specific information when importing something.

sampsyo commented 10 years ago

Sure, makes a lot of sense. Two wrinkles I can think of:

Good ideas here!