digital-preservation / PRONOM_Research

27 stars 8 forks source link

ADTS and MP3 clashes #66

Open Dclipsham opened 3 months ago

Dclipsham commented 3 months ago

The attached file is the output of a Real Audio -> MP3 migration, using FFMPEG (package ffmpeg-5.1.4-1.el9) using a super basic ffmpeg -i <inputfile.ra> <outputfile.mp3> command line conversion.

The file currently identifies as fmt/1812 ADTS, because there is an 0xFFF041 sequence within 2045 bytes of the ID3 tag, so it pattern matches Audio Data Transport Stream sig.2 variant, and fmt/1812 has priority over fmt/134 MP3. The fmt/1812 identification is a false positive and it should identify as fmt/134 MP3.

I don't have an immediate solution and am curious if @thorsted has any thoughts...?

fmt-404_RealAudio_44_mp3.zip

thorsted commented 3 months ago

I believe this is not the first time, but it seems to be very rare. These two formats have many similarities. In fact I believe an ADTS can be an mpeg-2 stream. I am curious what others recommend. Output from "file" on this MP3: fmt-404_RealAudio_44.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v2.5, 24 kbps, 8 kHz, Monaural

These frame formats can be tricky. Can we add another sequence of the same frame pattern to the ADTS signatures to be more accurate?