DeaDBeeF-Player / deadbeef

DeaDBeeF Player
https://deadbeef.sourceforge.io/
Other
1.65k stars 178 forks source link

id3v2.3 parsing may be incorrect #2514

Closed sktt closed 3 years ago

sktt commented 3 years ago

I found that certain files had missing id3v2 tags in some media players.

Steps to reproduce the problem

I initially thought this was a problem with the transcoding tool that I am using but I am no longer sure. Steps are thus here: https://github.com/robinbowes/flac2mp3/issues/58. I've als uploaded the problematic mp3 file here: https://0x0.st/--Ih.mp3

What's going on? Describe the problem in as much detail as possible.

Tag frames with special characters can be encoded as UTF-16. The UTF-16 string starts with a BOM \xFF FE which collides with mp3's synchronization frames. The ID3v2.3 spec suggests "unsynchronizing" those patterns by adding an extra \x00 byte on those occasions. The added \x00 byte leads to the tag frame value length becoming off by 1 if the tags are parsed before "un-unsynchornizing".

There seem to be different opinions on whether tags should be parsed before or after un-unsynchornization. libmad's libid3tag will remove \x00 before parsing. deadbeef and for example libmpg123 wont. Correction: deadbeef does remove unsync bytes before parsing but the problem appears to be that it assumes that the Size of the frame also constitutes for the added unsync bytes.

Information about the software:

Deadbeef version: master branch OS: Linux

Oleksiy-Yakovenko commented 3 years ago

Your description of the problem is not particularly clear.

If you mean that deadbeef is writing bad tags: Deadbeef doesn't use unsynchronization when writing id3v2 tags, therefore the issue described at the flac2mp3 link doesn't seem to be related.

If you mean that deadbeef can't read tags written by some other tagger: it's another story... maybe those taggers are bad?

Oleksiy-Yakovenko commented 3 years ago

It turns out, however, that the spec is not clear on whether tags should be un-unsynced (ie removing that added \x00 byte) before or after parsing the id3 tags.

The spec is clear enough. The Unsync bytes can be anywhere in the tag, and need to be removed before parsing it. If deadbeef doesn't do that, it's definitely a bug.. But I looked in the code, and it doesn't seem to be the case. Each frame is preprocessed to remove unsync bytes before being interpreted.

sktt commented 3 years ago

If you mean that deadbeef can't read tags written by some other tagger: it's another story... maybe those taggers are bad?

Yes I mean reading tags from a different tagger. I was suspecting that the tagger was messing it up too but I am starting to think it's a problem with handling unsync + utf16.

Deadbeef's own tagger does not set the unsync flag or inject null bytes at false sync frames. That might be a bug of its own. You can set for example a tag to 당 or something that can't be translated to utf8. and verify this with mediainfo -f --Details=1 file.mp3

sktt commented 3 years ago

The problem might be that junklib assumes the frame size is including unsync bytes. However what it maybe should do is to treat is as the size after sync_frame

sktt commented 3 years ago

Right junklib_id3v2_sync_frame will assume that frame size is before unsynchronisation

2.4 spec says

   The ID3v2 tag size is the sum of the byte length of the extended
   header, the padding and the frames after unsynchronisation

This is the size of the entire tag though..

2.3 spec says

Do bear in mind, that if a compression scheme is used by the encoder, the unsynchronisation scheme should be applied afterwards. When decoding a compressed, 'unsynchronised' file, the 'unsynchronisation scheme' should be parsed first, decompression afterwards.

To me this sounds like frame size should be assumed to be the size after synchronization. I'll make the change in my own code and see what happens. Also a PR if you are able to make more sense out of it.

Oleksiy-Yakovenko commented 3 years ago

Deadbeef's own tagger does not set the unsync flag or inject null bytes at false sync frames. That might be a bug of its own.

This is by design.. Even 12 years ago when Deadbeef project started, it was nearly impossible to find an mp3 player, that would not be able to read/ignore id3v2 tags. Today it's even less relevant, and would increase the tag writer code complexity.

Oleksiy-Yakovenko commented 3 years ago

The only valid modern usecase for unsynchronized id3v2 tags I could come up with, is when people want to concatenate multiple mp3 files without stripping tags. But this usecase is also quite unusual, and it's very trivial to strip the tags before concatenating.