Serial-ATA / lofty-rs

Audio metadata library
Apache License 2.0
176 stars 34 forks source link

MP4: Unexpected upgrade of `gnre` to `©gen` #409

Open uklotzde opened 2 weeks ago

uklotzde commented 2 weeks ago

Reproducer

parse_ilst()

Summary

This opinionated behavior not only prevents applications from detecting and fixing/resolving those issues themselves. It causes ambiguities and mixes up metadata which is much more concerning.

Afterwards the imported Ilst contains both regular ©gen atoms side by side with implicitly migrated gnre atoms. They are indistinguishable and alter the metadata unexpectedly when written back to the file.

The only situation when such an implicit migration would be acceptable is if the original Ilst does only contain gnre but no ©gen atoms. But conditional, data-dependent behavior is harder to reason about, so better not try to be clever.

Expected behavior

This feature should either be made optional or provided as a manual, post-processing step on the imported Ilst. I would prefer the latter, because import and migration should be independent.

Assets

No response

Serial-ATA commented 2 weeks ago

Are you saying after parsing a tag you end up with both ©gen and gnre atoms? That shouldn't happen, all gnre atoms should be upgraded. I do agree that the user should have the choice to toggle the conversions, though.

I'm thinking about adding ParseOptions::implicit_conversions since I'm working on #62 right now and it'll support merging TYER, TDAT, and TIME into TDRC on parse. Does that sound good?

I'd rather make conversions a toggle than remove them outright, since they make it possible to get files on the same page simplifying the implementation of <Ilst as Accessor>::genre for example.

uklotzde commented 2 weeks ago

Are you saying after parsing a tag you end up with both ©gen and gnre atoms? That shouldn't happen, all gnre atoms should be upgraded. I do agree that the user should have the choice to toggle the conversions, though.

No. But you can't tell the ©gen tags apart if there are multiple of them. You end up with multi-valued genre tags that need to be resolved manually. Probably not intended and very inconvenient.

Fortunately, I didn't write any tags back with lofty yet and so I could fix the original files first by removing the legacy gnre tag.

uklotzde commented 2 weeks ago

The ambiguities only occur for files with both gnre and ©gen tags.

Serial-ATA commented 2 weeks ago

But you can't tell the ©gen tags apart if there are multiple of them

I'm not sure what you mean, the atoms contain no additional metadata to separate them from one another. You can iterate all values like this:

let genre_atom = ilst.get(&AtomIdent::Fourcc(*b"\xa9gen"));
for data in genre_atom.data() {
    if let AtomData::UTF8(genre) = data {
        println!("Genre: {genre}");
    }
}

Unless I'm misunderstanding what you mean?

You end up with multi-valued genre tags that need to be resolved manually. Probably not intended and very inconvenient.

With ilst, it is expected that atoms be merged. I took a look at TagLib, mutagen, and music-metadata and they all do the same thing with the gnre atom.

None of them let you toggle that though, so I'll definitely look into making that an option. It'll just have to be made very clear that it will break Accessor methods.

uklotzde commented 2 weeks ago

I am referring to this situation:

Before

After

The value from the legacy atom should be ignored if the official genre atom is present. Otherwise you end up with multiple genres and could no longer decide which one of them is the "real" one. The legacy atom was probably just a left over that has not been removed when adding ©gen.

I decided to strip the gnre atom from all my files using AtomicParsley after noticing Lofty's warning. To avoid importing old garbage data. Unfortunately, the warning doesn't indicate which file is affected.

uklotzde commented 2 weeks ago

TagLib maps gnre to ©gen. But then only the first ©gen atom wins and the content of all following atoms with the same identifier are discarded.

https://github.com/taglib/taglib/blob/f3fb4d83a469fea7e23491a1dfbac14c728ac968/taglib/mp4/mp4tag.cpp#L577

Also not optimal but at least unambiguous.

Serial-ATA commented 2 weeks ago

Ah, yeah that's a confusing situation. We can't really signal which genres have been converted.

I've already added ParseOptions::implicit_conversions locally. I guess if a ©gen is present, we can also just skip the conversion regardless of the toggle? I'd rather not follow in TagLib's steps and just throw away atoms.