JamesHeinrich / getID3

http://www.getid3.org/
Other
1.14k stars 244 forks source link

Incorrect parsing of Quicktime files #452

Closed leenooks closed 2 days ago

leenooks commented 2 weeks ago

Howdy, was just trying getID3 again for my photo/video parsing, and discovered an error parsing the mp3 files (from an iphone).

I've tested with two files, and both show data that doesnt match it's label (I'm guessing from the keys and ilst atoms.

Here is an example:

      "GETID3_VERSION" => "1.9.23-202310190849"
...
      "tags" => array:1 [
        "quicktime" => array:6 [
          "camera.lens_model" => array:1 [
            0 => "iPhone SE (3rd generation) back camera 3.99mm f/1.8"
          ]
          "camera.focal_length.35mm_equivalent" => array:1 [
            0 => "\x00\x00\x00\e"
          ]
          "make" => array:1 [
            0 => "4.559366"
          ]
          "model" => array:1 [
            0 => "-12.9871+123.0725+024.569/"
          ]
          "software" => array:1 [
            0 => "Apple"
          ]
          "creationdate" => array:1 [
            0 => "iPhone SE (3rd generation)"
          ]
        ]

To be sure it wasnt any issue with the files, I wrote my own atom processor, and this is the what it should be:

  #items: array:6 [
    "mdta.com.apple.quicktime.location.accuracy.horizontal" => "4.559366"
    "mdta.com.apple.quicktime.location.ISO6709" => "-12.9871+123.0725+024.569/"
    "mdta.com.apple.quicktime.make" => "Apple"
    "mdta.com.apple.quicktime.model" => "iPhone SE (3rd generation)"
    "mdta.com.apple.quicktime.software" => "17.5.1"
    "mdta.com.apple.quicktime.creationdate" => "2024-07-15T17:32:43+1000"
  ]
JamesHeinrich commented 2 weeks ago

If you're able to spot the error(s) in module.audio-video.quicktime.php any patches to the code are very welcome.

If not, please provide a sample file that shows the problem and I'll try to find out why.

leenooks commented 5 days ago

Howdy, just wanted to share back with you. I dont have any patches to share, quicktime seems quite complex and I've struggled myself to understand the detail information for each atom type. As I mentioned, I wrote my own quicktime parser as part of understanding this problem and to get the information I need.

What I think I've discovered with your implementation, is that some data elements are pascal strings, but you are calculating them as a fixed sized string to assume it's value. What that has resulted in the incorrect interpretation of some data (as highlighted above). Also, you are manually keying in offsets in data fields, and while I found 1 error (below), there may be others?

The first case that I've noticed this is in the STSD atom, where the codec (or you call the compressor_name) is a pascal string, but you are assuming it is 4 chars. And while it's value if 4 chars (HEVC), its prefixed with \x04, and you should probably be reading that first char, to know how long the string is.

The second related problem, in my video, the compressor_name is 4 chars HEVC, but your parsing of that file is already off by 4 chars when it gets to it, and thus parses it as C\x00\x00\x00 (frame count and data size are also messed up). I think this is because your missed 4 bytes here (line 835):

                                $atom_structure['sample_description_table'][$i]['height']           =   getid3_lib::BigEndian2Int(substr($atom_structure['sample_description_table'][$i]['data'], 18,  2));
                                $atom_structure['sample_description_table'][$i]['resolution_x']     = getid3_lib::FixedPoint16_16(substr($atom_structure['sample_description_table'][$i]['data'], 24,  4));

(18 plus 2 bytes, takes you to 20, but you start at 24)

Ultimately, movie file parsing logs issues with parsing of other atoms (my parsing doesnt find these atoms), and I suspect it is related (or maybe it is a red-herring?)

     "warning" => array:5 [
        0 => "Unknown QuickTime atom type: "sgpd" (73 67 70 64), 44 bytes at offset 857564"
        1 => "Unknown QuickTime atom type: "sgpd" (73 67 70 64), 26 bytes at offset 857608"
        2 => "Unknown QuickTime atom type: "sbgp" (73 62 67 70), 44 bytes at offset 857634"
        3 => "Unknown QuickTime atom type: "cdep" (63 64 65 70), 12 bytes at offset 859002"
        4 => "Unknown QuickTime atom type: "cdep" (63 64 65 70), 12 bytes at offset 859627"
      ]

I'm happy to share how I parsed the quicktime files - I took a different approach and compared to yours its a long way from being complete to get specific values (but enough for me to pluck out the stuff I was wanting to get - as well as being less reliant on manual offset starting points).

JamesHeinrich commented 2 days ago

Thanks, changed in https://github.com/JamesHeinrich/getID3/commit/ded8b13c8df25d3378056f80a6ddd3a6da36830a Should fix the missing 4-byte offset gap, as well as the Pascal string. It looks like I probably originally misinterpreted "32 byte Pascal string" for "32-bit string". But who knows what documentation I was working from 20 years ago.

Please reopen, preferably with a sample file, if any issues persist.