beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.94k stars 1.82k forks source link

Discogs (and other metadata sources) polluting mb_albumid #604

Open a9y opened 10 years ago

a9y commented 10 years ago

The discogs plugin sets album_id of the AlbumInfo class to a discogs identifier, which beets.autotag.apply_metadata assigns to mb_albumid. The problem is that everywhere else it's assumed to be a MusicBrainz identifier.

sampsyo commented 10 years ago

This is a great point; reusing the MBID fields is an abuse that can cause weird problems. We should instead set a flexible attribute. Alas, we currently don't have a mechanism for setting flexible attributes from metadata matches, but we should definitely add this. (In fact, doing so could simplify the AlbumInfo/TrackInfo types and related construction/application logic.)

a9y commented 10 years ago

I played around with a few solutions, but none were ideal.

The first was to store data_source as a flexible attribute, and then rename the _mbXXX fields to something more abstract like _dsXXX. The user then knows to check the value of data_source before working with those fields. The downside here is that it would require a database migration, and any external plugins that used those fields would stop working.

The second idea was to change AlbumInfo and TrackInfo to look more like this (not tested):

class AlbumInfo(object):
    def __init__(self, **kwargs):
        values = {}
        values.update(kwargs)
        object.__setattr__(self, 'values', values)

    def __getattr__(self, item):
        return self.values.get(item, None)

    def __setattr__(self, key, value):
        self.values[key] = value

In this scenario the data-source plugins could set their own fields (to be stored as flexible attributes) when instantiating AlbumInfo. Discogs could set di_albumid, di_trackid, ... Beatport could set bp_albumid, bp_trackid, ...

The advantage here is that you're not limited to just the one data-source.

sampsyo commented 10 years ago

Absolutely; the second approach is definitely the way to go! The AlbumInfo and TrackInfo objects should be able to carry along arbitrary field settings that are then applied in apply_metadata. The only trick is that they should also have some fields that are not just dict keys because some fields are required or need special handling—see the apply_metadata code for those fields.

dengste commented 9 years ago

I just wanted to add that this is also true for the artist_id. If an album is identified via discogs, it will be tagged with discogs' artist-id. I stumbled upon this because when I import my library into Ampache, it will frequently have the same artist twice - one with the Musicbrainz id, and one with Discogs', and I have to manually fix that.

skupjoe commented 7 years ago

It's also worth noting that the Dicogs data source does not supply mb_trackid information, so (by default) the Duplicates plugin will flag anything that shares null entries for this field as duplicate. Meaning, all Discogs-tagged tracks will be flagged as a duplicate.

The workaround is to additionally add a third parameter to the Duplicate plugin to lastly analyze Title.

nicolahinssen commented 6 years ago

@sampsyo Asked me to leave my thoughts here. #2763

My suggestion is to create the following fields for Discogs (TXXX is customizable so that wouldn't be a problem).

Old New
TXXX:MusicBrainz Album Artist Id=1560798 TXXX:Discogs Album Artist Id=1560798
TXXX:MusicBrainz Album Id=3175577 TXXX:Discogs Album Id=3175577
TXXX:MusicBrainz Album Release Country=Netherlands TXXX:Discogs Album Release Country=Netherlands
TXXX:MusicBrainz Album Type=Album TXXX:Discogs Album Type=Album
TXXX:MusicBrainz Artist Id=1560798 TXXX:Discogs Artist Id=1560798
dbogdanov commented 6 years ago

Agreed with @nicolahinssen on adding special fields for Discogs. This is a feature I would love to see.

More generally, it will be great to store metadata from other sources (Bandcamp, Beatport, etc.) in the same way. The plugins retrieving the metadata should be able to define special fields themselves. Probably there won't be a consensus on naming of the fields, therefore this naming could be configurable by a user.

dbogdanov commented 6 years ago

The only trick is that they should also have some fields that are not just dict keys because some fields are required or need special handling—see the apply_metadata code for those fields.

Should then all AlbumInfo and TrackInfo fields be kept, while adding a dict too?

sampsyo commented 6 years ago

I would actually be in favor of abandoning all the hand-coded attributes and just moving over complete to an unstructured dict-like object. It could simplify changes in the future quite a bit, and I don't think we get anything out of the current approach to the built-in set of fields.

bastidest commented 4 years ago

This is causing issues with other software (funkwhale) relying on the MB UUID format. They expect the fields to be valid UUIDs, not integers:

image

martinkirch commented 3 years ago

Hello, It seems this bug is still here. It has weird side effects: I discovered it after digging why absubmit fails for almost my whole library. The explanation is absubmit needs an UUID to build the submission URL. I guess it affects a few other plug-ins.

Maybe the simplest solution would be setting flexible attributes, as previously suggested

Discogs could set di_albumid, di_trackid, ... Beatport could set bp_albumid, bp_trackid, ...

?

Neurrone commented 2 years ago

Another use case that I'm running into is tagging an audiobook collection with metadata from Audible. Because of the hard-coded fields, I don't think its possible for me to store additional metadata that is not already defined in one of the existing fields.

Wondering if there are any plans to fix this in the near / mid term?

Edit: I realized my comment may not fit this issue, perhaps I should move it to #2866 instead.