Closed roundwheel closed 4 years ago
I might guess that it's storing the data in its own proprietary format. If you can create another small (preferably audio-blank) sample file in "Logic Pro X" that shows the same kind of metadata I can take a look, but no promises.
One of my users reported a similar issue with WAV files containing RIFF ID3v2.3 tags here: https://github.com/owncloud/music/issues/788. Here it seems that some similar WAV files have the problem and others don't. The problematic files can be parsed correctly e.g. by Windows 10 Explorer, foobar2000, and Mp3tag, so the format shouldn't be too exotic.
I was able to reproduce the issue with a sample file by using getID3 demo.basic.php, so clearly the issue is within the library. I have pasted the output of the demo below. It can be seen, that getID3 has extracted both the correct track title and some garbage value. For the author
field, the correct value has been extracted, but then some garbage characters have been included after the correct content.
I'll send a link to this sample file to info@getid3.org shortly.
Array
(
[author] => Array
(
[0] => Logic Prohÿ¿hÿ¿¸ÿ¿v·¢tÿ¿P
)
[title] => Array
(
[0] => ÐõÐõ8j¬HÅî´L¬0L¬õ ÿ¿Èÿ¿3ÿ¿Ðõ¿Þ¿ÞUJªªXÿ¿!(¿Þ¤îÄÂÓÿ¿ ÓÀXÿ¿õ_µõô ×õõ8j¬Þ«ªÿ¿ÍÞ0`ôù¬ÓÁÜнÜ?įÌ`
[1] => Athletic Theme (Super Mario World)
)
[product] => Array
(
[0] => Consouls 1-1
)
[album] => Array
(
[0] => Consouls 1-1
)
[band] => Array
(
[0] => The Consouls
)
[track_number] => Array
(
[0] => 01
)
[text] => Array
(
[replaygain_album_gain] => -4.37 dB
[replaygain_album_peak] => 1.266338
[replaygain_track_gain] => +0.07 dB
[replaygain_track_peak] => 1.000000
)
)
@paulijar Thanks for the sample files. Both files contain "id3", "INFO", and "bext" RIFF chunks. The ID3v2 data inside the "id3" chunk seems sane and valid, and most likely the only thing that other programs would look at. The RIFF.INFO chunks also contain valid (if limited) data -- title, product, tracknumber. It's the bext
chunk that's malformed and causing problems. Part of the data that can be stored in the bext
chunk is a Title field, and since the chunk appears to be malformed a bunch of garbage data is being interpreted as being in the Title section of the chunk. By happenchance the "working" sample file just encountered nulls data when reading the Title section so doesn't generate a bad Title entry in the comments
output.
It would appear that it's a common issue with Logic Pro producing invalid data. If I had to hazard a guess, I might suggest that Logic Pro is writing bext tag data elements to the correct offsets, but may not be clearing the unused part of the tag prior to writing. If writing a brand new file then no problem, but if overwriting/modifying an existing file it looks like whatever random data was at that offset in the file before is not being erased. At least that's my guess based on the random nature of the data in those locations but with some strings (e.g. "Logic Pro" and timestamps) appearing perfectly correct in the places they're supposed to be. Perhaps you should open a bug report ticket with Logic Pro if these files were generated with a current version of that software (I'm not familiar with it).
If you wish to examine your "broken" sample file with a hex editor, you can see the bext
chunk starts at offset 40172186 with bext
, then 4 bytes 5B 02 00 00
indicating 603 bytes of data (in reality taking 604 bytes due to WORD-padded nature of RIFF files), and then following that should immediately be 256 bytes for the Title field, and so forth, but it's filled with apparently-random data, except where "Logic Pro" and "2014-08-2821:16:22" have been inserted in appropriate places.
Here is another sample file that you can reference.
@JamesHeinrich Thanks for the quick response!
Looking at the mentioned bext
chunk of the "broken" file with hex editor, it looks like this:
Clearly, the CHAR Description[256]
field (highlighted in the picture) starts with a null terminator. Shouldn't it then be interpreted as 0-length, regardless of the following bytes? Similarly, there is a null terminator after the string "Logic Pro", so the garbage following it should be ignored.
@roundwheel's example file seems to have similar bext description field where the first byte is 0, and the rest of the field is filled with random bytes.
@paulijar I had noticed the null-terminators in both the empty and non-empty strings, but it wasn't till I after your comment I re-read the spec document and indeed that is a valid interpretation. I still consider it poor programming to write null-terminated data into a fixed data space without wiping the previous contents, but I guess it's not invalid.
I have fixed the issue in https://github.com/JamesHeinrich/getID3/commit/97f8c303eb9cfc808bd455c3cba2dec6de43a3d4 and the sample files should read correctly now.
Thank you (both) for the sample files and pointing me in the right direction.
Great, thanks! Seems to work fine now. I agree that it is probably not the best design choice to include the contents of some random uninitialized memory in a written file. In theory, it could even cause a breach of confidentiality when some random memory contents gets shared more widely than intended. It's unlikely, though, that there would be anything interesting.
While reading the RIFF/WAVE metadata on a WAV file, some of the data is coming through garbled.
As you can see in the screenshot, this is a WAV file that was created/exported using Logic Pro X. Is this a known issue with that software, or possibly a text encoding issue? Or maybe just corrupted metadata? The audio plays fine.