Open wantehchang opened 1 year ago
Running avifdec in the debugger, I found that the null character is at the end of the avif->xmp.data
buffer.
The XMP specification says "When XMP is encoded as UTF-8, there are no zero bytes in the XMP Packet." It seems that the XMP metadata in moon.jpg has an XMP packet that is incorrectly null-terminated.
Error writing PNG: XMP metadata contains an invalid null character
comes from
I assume the issue is that avifenc
can generate a file that avifdec
refuses. Solutions I can think of:
Opinions?
IMHO, it makes sense to try and sanitize the XMP chunk at the point of import, from any format (this is not tied to PNG spec only, this is in the XMP spec), so you don't even produce an "invalid" AVIF in the first place. I think this is the approach we took in exiv2 as well as libexpat was throwing a similar error and refused to parse it from a sample JPEG...
@kmilos Thanks for the info. Do you have any pointer to exiv2 or libexpat code implementing what you described?
Fond it - take a look at https://github.com/Exiv2/exiv2/pull/2133 and the related issue report.
libexpat just assumes its input is not terminated and simply barfs if it is, it has no workarounds for it. I remember even looking into to the libexpat code to convince myself that the nul character was explicitly in the list of invalid XML characters they check for...
We need to first make sure the terminating null byte comes from moon.jpg rather than our code in avifJPEGRead()
. That is, we should verify that libjpeg does not null-terminate the data in marker->data
and include the appended null byte in marker->data_length
. I believe this is the case, but it would be good for another person to doublecheck it.
Once we know avifJPEGRead()
is correct, then we change it to handle this kind of XMP metadata error. We can either report this error or fix up this error. If we fix up this error, I suggest we only strip the null byte at the end because it is very likely a programming error. If a null byte occurs in the middle of an XMP packet, that is more likely corrupted data and we should not try to fix that up.
We can leave the check during PNG export unchanged.
This has nothing to do w/ libjpeg per se, who knows how that original file was created....
As noted, exiv2
(0.27.5) fails to parse the file:
$ exiv2 -p x print moon.jpg
Error: XMP Toolkit error 201: Error in XMLValidator
Warning: Failed to decode XMP metadata.
exempi
(version 2.6.3) will dump the packet, however.
$ exempi -x moon.jpg
dump_xmp for file moon.jpg
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Exempi + XMP Core 6.0.0">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
...
As noted,
exiv2
(0.27.5) fails to parse the file:
Yep, the fix is in exiv2 0.27.6
Btw, exiftool -v5
can also provide insight into all of the chunks for most formats, and for moon.jpg I do see the culprit in the file:
JPEG APP1 (3629 bytes):
011c: 68 74 74 70 3a 2f 2f 6e 73 2e 61 64 6f 62 65 2e [http://ns.adobe.]
012c: 63 6f 6d 2f 78 61 70 2f 31 2e 30 2f 00 3c 3f 78 [com/xap/1.0/.<?x]
013c: 70 61 63 6b 65 74 20 62 65 67 69 6e 3d 22 ef bb [packet begin="..]
[snip]
0f2c: 20 20 20 20 20 20 20 20 20 3c 3f 78 70 61 63 6b [ <?xpack]
0f3c: 65 74 20 65 6e 64 3d 22 77 22 3f 3e 00 [et end="w"?>.]
We need to first make sure the terminating null byte comes from moon.jpg rather than our code in
avifJPEGRead()
. That is, we should verify that libjpeg does not null-terminate the data inmarker->data
and include the appended null byte inmarker->data_length
. I believe this is the case, but it would be good for another person to doublecheck it.
I confirm that at the following point:
standardXMPData[standardXMPSize-1]
is 0
when reading moon.jpg and 62
when reading paris_exif_xmp_icc.jpg.
Miloš, Yannis: Thank you very much for confirming our code is correct and the error is in the moon.jpg file.
Should we report an error if an XMP packet is not valid UTF-8? Or just check if the XMP packet contains a zero byte (similar to the check in avifpng.c)?
Our error message can point to tools that can be used to inspect and correct the errors in the XMP packet.
The XMP specification says "When XMP is encoded as UTF-8, there are no zero bytes in the XMP Packet."
I found that sentence only in XMP spec Part 3 section 1.1.2 GIF (Graphic Interchange Format)
. Does it apply to any XMP chunk in any format (including JPEG)? The PNG spec forbids it too, I did not check other formats.
Should we report an error if an XMP packet is not valid UTF-8? Or just check if the XMP packet contains a zero byte (similar to the check in avifpng.c)?
I started writing a change involving a new avifFixImageXMP()
in avifutil.h
, that is called from avifJPEGRead()
and avifPNGRead()
. It just trims trailing \0
s. It could also be called during AVIF decoding. \
If \0
is forbidden in XMP chunks in JPEG and in AVIF, we can add a check at AVIF encoding. Otherwise, I suggest we just leave it like that: carry any non-trailing \0
over to AVIF and JPEG exports.
I started writing a change involving a new avifFixImageXMP() in avifutil.h, that is called from avifJPEGRead() and avifPNGRead().
I feel you should do it at a single/common entry point (e.g. avifImageSetMetadataXMP()
or some such), not for every possible image format (easy to forget when new ones are added)...
If
\0
is forbidden in XMP chunks in JPEG and in AVIF,
It is forbidden in UTF-8 XML period, nothing to do w/ images. As I said, libexpat just flatly refuses to parse null-terminated XML, no matter where it comes from. From https://en.wikipedia.org/wiki/Valid_characters_in_XML:
Note that the code point U+0000, assigned to the null control character, is the only character encoded in Unicode and ISO/IEC 10646 that is always invalid in any XML 1.0 and 1.1 document.
I feel you should do it at a single/common entry point (e.g.
avifImageSetMetadataXMP()
or some such), not for every possible image format (easy to forget when new ones are added)...
I considered that, but wondered whether XMP chunks provided by users through the API should be altered. For example, someone setting 100 bytes of XMP, then accessing avifImage.xmp.data[99]
. If the behavior is described in avif.h
, I guess that is fine. \
avifImageSetMetadataXMP()
could also check for non-trailing \0
and return an avifBool
. \
The main issue I had with this approach is that there is currently no single entry point. Extended XMP in avifJPEGRead()
does not use avifImageSetMetadataXMP()
.
not for every possible image format (easy to forget when new ones are added)...
I agree. Although it is just a few formats in libavif (for now...).
It is forbidden in UTF-8 XML period, nothing to do w/ images.
Great, this simplifies the fix. Thanks for the reference.
Yannis: Since I filed this issue, I wanted to clarify that after confirming our code is correct and the error is in the moon.jpg file, I think our current policy is fine (importing metadata faithfully, and only checking for errors that break the output format, such as zero bytes for a PNG iTXt chunk). As the bug reporter, I think it is okay to close this issue.
If you'd like to remove trailing null characters on import, I suggest we remove only one (to handle the off-by-one bug of including the terminating null character of a null-terminated string). Fixing up errors will hide them and make it harder to track down the source of the errors, so if an extra terminating null character is not a common error, we don't really need to fix it up.
Well, then the bug is that you only check/warn for PNG, while it should be for all output formats.
I think an avifenc
success followed by an avifdec
error is a concern and should be fixed. Either both should succeed or both should fail. avifenc
should not generate files that avifdec
cannot convert to a widespread format such as PNG. I suggest avifenc
to choke on \0
, because we already check Exif. The alternative would be to make avifdec
succeed with a warning such as The XMP chunk was discarded because it contains an invalid character
.
In parallel, removing a single trailing null character on any import seems acceptable.
As an alternative, you could just not fail at all on either avifdec
or avifenc
, and just copy the XMP chunk around as is, and defer the problem to other 3rd party apps, with preferably just a "YMMV" style warning given to the user if this condition was detected.
Or add an avifenc
option to chomp invalid XML if keeping the original is imperative (or the other way around - default to valid output, but give an option to keep original)...
I found this topic very interesting!
I encountered the same issue when I discovered that the pre-installed files IMG_0001.jpg and IMG_0002.jpg in the iOS simulator also have the illegal NUL terminator.
Initially, I considered whether the issue should be addressed in https://github.com/Ashampoo/kim or not. However, I realized that even the command "exiftool -xmp -b IMG_0001.JPG > out.xmp" exports the NUL character at the end. Therefore, it seems correct that the XMP string is exported as is - with all errors.
In the end, I made the decision that the proper approach would be to handle this issue within the XMP parser. I fixed it in my case in https://github.com/Ashampoo/xmpcore/pull/4/files.
I want to express my gratitude for all the insights shared during this discussion. Thank you!
However, I realized that even the command "exiftool -xmp -b IMG_0001.JPG > out.xmp" exports the NUL character at the end. Therefore, it seems correct that the XMP string is exported as is - with all errors.
Not quite. Note that the -b
option just dumps the raw byte stream without any parsing, it doesn't care/know if it's XML or not. If exiftool were to parse & copy instead, it might drop it...
Not quite. Note that the
-b
option just dumps the raw byte stream without any parsing, it doesn't care/know if it's XML or not. If exiftool were to parse & copy instead, it might drop it...
Yes, that's correct. If ExifTool parses the actual XMP and writes new it looks very different.
In my case I have two libraries: Kim
can read EXIF and extract the raw XMP string and XMP Core
is for actual parsing of XMP.
At first I was unsure where I should fix the problem and wanted Kim to output a trimmed XMP string. But after seeing that ExifTool extracts the untouched original I decided to do the same. The XMP parser must be the one resilient to this problems.
ExifTool ignores the invalid char if parsing the data, but it exports the RAW data exactly as it is. I think this is the correct behavior.
I just noticed that also the XMP of the embedded preview JPG of my RAF files produced by my X-T4 have NUL characters at the end. In that case two.
Yannis: Please take a look at the XMP metadata in this image and see if avifpng.c is rejecting it correctly.
Steps to reproduce:
The avifdec command fails with the error message: