agersant / mp3-duration

MIT License
36 stars 12 forks source link

Other weird mp3-files #9

Closed mqus closed 4 years ago

mqus commented 5 years ago

I debugged the remaining files and they had one the following cases:

  1. An album with Lyrics3v2 (see https://web.archive.org/web/20190118053655/id3.org/Lyrics3 and https://web.archive.org/web/20190118053653/http://id3.org/Lyrics3v2 )
  2. One file with other weird appended stuff (probably proprietary, I didn't find anything)
  3. A file which was actually corrupt and only sounded fine because it sounded like it was a part of the song and the players glossed over it
  4. Files where the size of the last mp3-frame was estimated larger than it actually was and a following APETAGEX/TAG frame, resulting in the parser jumping right into it. This had 2 variants: 4.1. (1 Audiobook): the estimated frame-size was always the same, but the actual framesize varied. This also affected only about 1/3rd of the mp3-files of the audiobook. 4.2. (1 Album): The estimated Frame-Size differed, but it always jumped to the same position in the following TAG (id3v1) Frame, specifically the year field.
  5. A file with 193 unidentifiable non-zero bytes between the ID3v2 tag and the start of the first following 0xFF
  6. A file which threw a InvalidSamplingRate { sampling_rate: 3 } (but plays just fine in players)

I can send you some files, if you want to check it out yourself.

Thoughts on possible solutions:

agersant commented 5 years ago

Nice sleuthing!

I mostly agree with your analysis.

I don't like the option of scanning unknown data until encountering a valid frame header. While it would be rare that a valid frame header appears in arbitrary data, it would be a very undesirable failure case.

I would be happy to return the current duration after encountering unexpected data. That is already what we do for unexpected end of file (even in the middle of an audio frame), and most bizarre frames reside after the audio data. Since this will be a breaking change anyway, I think the duration after an unexpected EoF should also go in the error.

I would be interested in files from 1/2/4, although I'll have to craft public domain ones for unit testing.