beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.88k stars 1.82k forks source link

Continued retagging of the `albumtype` field (and potentially other fields) #4715

Open JOJ0 opened 1 year ago

JOJ0 commented 1 year ago

After merging of #4582 there is a related issue remaining that was intentionally not addressed in that pull request.

We still see continued retagging of the albumtype (singular) field in certain cases.

Note: This bug is not about the ['a', 'l', ...] problem! This has been fixed in #4582 already! You might still have broken data in your database and thus see it. Follow this description to clean up your library: https://github.com/beetbox/beets/pull/4582#issuecomment-1445023493

The problem

The MusicBrainz API provides albumtypes and albumtype separately, but beets tries to derive the albumtype when reading files as being the first element of albumtypes. This can lead to the problem where beet write tries to modify the albumtype every time.

If an album has multiple types the MusicBrainz API seems to indicate that one of those is the "primary" type, and beets stores that in the db as the albumtype, but parsing the ID3 tag of the file it seems like it just uses the first element of albumtypes as the albumtype.

Writing the albumtype of those files doesn't actually write the file's tag, so beet write will always report trying to write it. It will update the timestamp of the file though, which could confuse other tools, making them "think" the file received relevant changes.

Reproduction of the problem

Timestamp of file is 10:34

$ ls -l "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac"
-rw-r--r--  1 jojo  staff  46857531 Dec 19 10:34 /Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac

Albumtype file tag is album / compilation:

$ mediainfo "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
Cover type                               : Cover (front)
RELEASETYPE                              : album / compilation
MUSICBRAINZ_ALBUMTYPE                    : album / compilation

which in reality is two values, each being set to a single tag:

$ mutagen-inspect "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
RELEASETYPE=album
RELEASETYPE=compilation
MUSICBRAINZ_ALBUMTYPE=album
MUSICBRAINZ_ALBUMTYPE=compilation

Now writing shows changes but they only show a single value instead of both values:

$ beet write body heat
Fockewulf 190 - The Dark Side of Italo Disco - Body Heat (vocal) - src:$data_source |Pop
  albumtype: album -> compilation

The file's timestamp gets updated:

$ ls -l "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac"
-rw-r--r--  1 jojo  staff  46857531 Dec 19 10:56 /Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac

The file's tag is still the same, beet write shouldn't have bothered to update the file!

$ mediainfo "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
Cover type                               : Cover (front)
RELEASETYPE                              : album / compilation
MUSICBRAINZ_ALBUMTYPE                    : album / compilation

$ mutagen-inspect "/Users/jojo/Music/dev-beets/Compilations/The Dark Side of Italo Disco/01 Body Heat (vocal).flac" | grep -i type
RELEASETYPE=album
RELEASETYPE=compilation
MUSICBRAINZ_ALBUMTYPE=album
MUSICBRAINZ_ALBUMTYPE=compilation

Setup

My configuration (output of beet config) is:

not relevant

Related iterations of this bug

We might have a related issue if not exactly the same issue with other multi-valued fields (mb_albumartistids, ...) that have a single representation of that same field mb_albumartistid, ...). The logic in mediafile is identical. A detailed report of a user is found here: #5045

judemille commented 1 year ago

Writing the albumtype of those files doesn't do anything

That's not entirely true. When beets does this rigmarole, it changes the file's timestamp, which will cause tools like Nextcloud to go through the whole procedure as if there's a whole new version of the file (Nextcloud, and WebDAV-based tools in general IIRC, do not allow incremental file updates.)

JOJ0 commented 1 year ago

Thanks a lot, important detail! I'll add that to the description later today!!

judemille commented 1 year ago

A potential hacky fix could be to copy the albumtype from the file on disk to the tags to be written, so the comparison thinks all is well. Perhaps I'll give that a shot.

EDIT: removed question about what files are affected: inspected mediafile code, conclusion: all files affected. Also changing proposed solution.

judemille commented 1 year ago

Can confirm that solution fixes the issue. Is it the best solution? Hell if I know. Does it stop the continued retagging? Yes.

E: Really, I think the best solution would be to make the code simply ignore the albumtype field in all cases where it would be trying to write to disk, not just this command, as well as not displaying the extraneous "changes" that would be patched out.

dr-waterstorm commented 1 year ago

I've just been starting out with beets yesterday (version 1.6.0) and stumbled upon this issue.

I thought it might help to describe the issues I've seen so far:

When using beet write the first time (with the issue mentioned in #4582 ) it did store some information in the file. When using ffmpeg to read the tags I get those back:

    RELEASETYPE     : a;l;b;u;m
    MUSICBRAINZ_ALBUMTYPE: a;l;b;u;m

But after fixing the tags (as mentioned in the issue), I cannot write them back to the file as mentioned here. beet write shows the "correct" output (it wants to correct releasetype and albumtype), but does not actually write anything to the file. The log of beet write

  albumtype: a -> album
  albumtypes: ['a', 'l', 'b', 'u', 'm'] -> album

Note: This is a .flac file.

But I do have similar issues with .mp3 where it stored a plain "a" as tag:

MusicBrainz Album Type: a

The beet write command shows the same output as for the flac:

  albumtype: a -> album
  albumtypes: ['a', 'l', 'b', 'u', 'm'] -> album

Other files do have multiple tags, which looks even more interesting:

    RELEASETYPE     : a;l;b;u;m;;; ;s;o;u;n;d;t;r;a;c;k
    MUSICBRAINZ_ALBUMTYPE: a;l;b;u;m;;; ;s;o;u;n;d;t;r;a;c;k

Let me know if I can test/help with anything.

judemille commented 1 year ago

@dr-waterstorm it appears your installation of beets does not have #4582 merged in -- i would suggest installing from Git.

JOJ0 commented 1 year ago

@dr-waterstorm: @judmille is right, you have hit a previous issue, please install beets latest from git and then follow: https://github.com/beetbox/beets/pull/4582#issuecomment-1445023493

dr-waterstorm commented 1 year ago

Thank you @JOJ0 @judemille you were right about the issue with the release type. I did not update to GIT version, but only pulled the latest version from the Linux repo before. My apologies for not trying that :). That issue is fixed.

With the release types corrected, now I do observe the issue with he albumtype. Beets keeps showing vast amounts of

albumtype: album -> compilation

and similar. Checking the file with ffmpeg for the tag still shows the old one, so it did not update the actual tag.

gamoeri commented 1 year ago

I am running into a similar issue - it does not update the albumtype, even if reimported.

ilmc888 commented 1 year ago

I have the same issue. Knowing next to nothing about the code but after some debugging I think the problem lies in mediafile.py (version 0.12 = same as master):

def update(self, dict):
        """Set all field values from a dictionary.

        For any key in `dict` that is also a field to store tags the
        method retrieves the corresponding value from `dict` and updates
        the `MediaFile`. If a key has the value `None`, the
        corresponding property is deleted from the `MediaFile`.
        """
        for field in self.sorted_fields():
            if field in dict:
                if dict[field] is None:
                    delattr(self, field)
                else:
                    setattr(self, field, dict[field])

When the albumtype tag is set correctly in this routine (to e.g., compilation) it gets immediately reverted back to album the next iteration of that loop which happen to set albumtypeS. My guess is that somehow mediafile overrides albumtype with the first entry of albumtypes again (by some operator overloading or such, haven't digged deeper yet).

Albumtype is also defined lower down in mediafile.py as a single field of albumtypes, which I assume will be the first entry of albumtypes:['album', 'compilation'].

    albumtypes = ListMediaField(
        MP3ListDescStorageStyle('MusicBrainz Album Type', split_v23=True),
        MP4ListStorageStyle('----:com.apple.iTunes:MusicBrainz Album Type'),
        ListStorageStyle('RELEASETYPE'),
        ListStorageStyle('MUSICBRAINZ_ALBUMTYPE'),
        ASFStorageStyle('MusicBrainz/Album Type'),
    )
    albumtype = albumtypes.single_field()
ilmc888 commented 1 year ago

Reading some other issues, the cause might already been known :-)

Is there any agreement on how to handle this? Just write albumtype as is, or check whether albumtype is in albumtypes? Both reading and writing.

JOJ0 commented 1 year ago

@ilmc888 I'd love to finally get to fixing this but I'm not yet sure what's the best way. If you feel like it help me brainstorm. The following ideas and I didn't check any of it in code:

Take everything with a grain of salt and double check. Probably I'm just thinking out loud, being too lazy to read code atm (other music coding things in mind right now ;-)) I'd love to get help with tackling that issue, it's been long due....

ilmc888 commented 1 year ago

I might not be aware of all the intricacies yet, but I do understand that Musicbrainz has both albumtype and albumtypes information, but we can have only one tag 'MusicBrainz Album Type'?

Does it make sense to take a look at Musicbrainz Picard and how that information is handled there (just tag one track and ffprobe it)? Those are their own tags after all: https://github.com/metabrainz/picard https://picard-docs.musicbrainz.org/downloads/MusicBrainz_Picard_Tag_Map.html

In general I would just think about how it will be used by music player. If albumtype tag can / would be a list (or we make an extra albumtypes tag; if this is even possible), will this actually be used by music players? I suspect not (yet), so this would not be really practically useful?

On the other hand, I agree using the first entry of albumtypes might not be always informative (e.g., often just 'album' instead of 'live', 'compilation', etc.), so I understand why there was some logic included in beets to handle this. Still, I see this more as a flaw of Musicbrainz than anything else and wouldn't mind a one-on-one mapping ignoring albumtypes all together (until there is an accepted solution in the audio community). Including such logic will always be prone to errors (I believe I saw albumtypes of album / live / compilation / soundtrack, what would be the first one?).

Actually not sure if there is extra logic to extract one type from albumtypes? Does MB expose the primary release type (or can we find how they do it in the Picard code?)?

If the tagging source is MusicBrainz we should always ignore the single albumtype field. It should always be written from the multi field albumtypes

Well this seems like a solution to make the albumtype useful, but wasn't the problem that with a beet update you would then have to compare it to the internal albumtypes data (using the same approach to extract it). Otherwise you get the infinite update / write loop. Or just ignore the whole albumtypes in the database. On beet import / mbsync, use the MB albumtypes to extract a useful albumtype for the database and write it to the tag field (ignore the MB albumtype itself). A subsequent beet update will just compare the albumtype one-on-one and thus does not need any change.

This reply turned out longer than I expected, with likely a lot of (false) assumptions, but I hope it helps.

ilmc888 commented 1 year ago

I only had time to give it a brief look, so unsure if I'm looking at the right snippet but https://github.com/metabrainz/picard/blob/3b20cb2f2d3fb1df14e7d69a96d9280e4f219c15/picard/mbjson.py#L607

Here I see that releasetype, which is presumably written as a tag (see id3.py), is a concatenation of primary and secondary releasetypes. What would be the problem if beets is doing the exact same thing?

And on beet update, if a MB id exist (or a albumtypes is in the DB) compare the MB albumtype tag, which is a list then (again, assumption Im making), to that same concatenation.

ilmc888 commented 1 year ago

@JOJ0 I'm probably overthinking things as it seems the only problem at the moment is writing the right tag to disk. In that case your solution will work fine. I think that change has to be in mediafile.py (?) as there a single album type is extracted from album types.

However, what about just changing the order of album types when pulling the data from MB. Reorder it so the most important type is in front (compilation, live, ...) instead of just album. In that case mediafile.py will just work as is (I think it simply extracts the first element)? Other sources will also just work, and the change is contained within the MB code itself?

JOJ0 commented 11 months ago

@arogl you could try quickfixing your issue https://github.com/beetbox/beets/discussions/5042 by following my ideas in above comment. I still think that (at least) part of the problem could be solved like this: https://github.com/beetbox/beets/issues/4715#issuecomment-1655078459

So what I'm trying to say is: Try ignoring the (single) albumtype field in the musicbrainz autotag code and see if that changes anything.

We won't loose the information since musicbrainz also provides a multi field called albumtypes (I'm not sure of the name that comes from the MB api!) which will then be handled by mediafile when we write it to disk (https://github.com/beetbox/beets/issues/4715#issuecomment-1627192806). It could be though, that the useless writing of the albumtype file tag will still happen, regardless of the exclusion of the single albumtype field before (when getting it from the api and assigning it to the library objects) - I don't know - It would be super helpful to experiment with that!

Even a dirty quickfix would help us to draft a proper PR at some point!

celynw commented 10 months ago

Sorry to jump in, but this is driving me crazy so I thought I'd try something to help. If I'm following correctly, I tried commenting these lines:

https://github.com/beetbox/beets/blob/296f01b775d3c48fa720d487e9887c074f3b3194/beets/autotag/mb.py#L555-L558

Behaviour before:

$ beet ls -a -f '$album [$albumtype] [$albumtypes]'
Monstercat 001 - Launch Week [compilation] [album; compilation]
$ beet write
Arion - Monstercat 001 - Launch Week - Cold Blood & Ice Cream Cones
  albumtype: album -> compilation
$ beet write
Arion - Monstercat 001 - Launch Week - Cold Blood & Ice Cream Cones
  albumtype: album -> compilation

Behaviour after:

$ beet ls -a -f '$album [$albumtype] [$albumtypes]'
Monstercat 001 - Launch Week [album] [album; compilation]
$ beet write
$

So it fixes my infinite beet write loop. I'm on the most recent beets commit, 296f01b (after version 1.6.1)

Based on mediafile, it seems like it's returning the first field for the singular albumtypes, which I also think is supposed to be the primary field in MusicBrainz.

Oh, and in both cases, metaflac tells me that there are two RELEASETYPE tags:

    comment[15]: RELEASETYPE=album
    comment[16]: RELEASETYPE=compilation
JOJ0 commented 10 months ago

Thanks @celynw yeah it helps.

I have something very similar in the queue, which basically just ignores albumtype and never uses it anywhere in beets in favor of albumtypes.

It's coded finished but too busy to file the PR this week but hopefully during the next month.

JOJ0 commented 7 months ago

Some notes on a very high level for this issue:

Some of these issues I tried to address here: https://github.com/beetbox/beets/pull/5075

but I think that getting rid of albumtype (single-field) altogether is not the best solution. It breaks existing plugins that want to use it. We still need to find a working way of fetching a single (main) albumtype from albumtypes