ImageMagick / ImageMagick

🧙‍♂️ ImageMagick 7
https://imagemagick.org
Other
11.56k stars 1.31k forks source link

Parsing IPTC data in JPEG's APP13 segment - possible 8BIM collision #7040

Open seruss opened 5 months ago

seruss commented 5 months ago

ImageMagick version

7.1.1-26 Q16-HDRI x64 83eefaf:20240107

Operating system

Windows

Operating system, version and so on

Windows 10 Enterprise v2009 10.0.19041.3636

Description

We've identified a potential issue in the parsing of IPTC data within the APP13 segment of JPEG files. It appears that the software might be misinterpreting certain sequences as the start of IPTC data, leading to inconsistent or incorrect metadata extraction.

The sequence in question is FF ED C7 30, found at the beginning of an APP13 segment in a JPEG file. This sequence signifies the start of the segment (FF ED) and its length (C7 30, which translates to 50,992 bytes in decimal). The IPTC data is commonly stored within the APP13 segment, especially when processed with software like Adobe Photoshop. A specific sequence within this segment, 1C 01 DA C7 0F BD, was initially thought to be an IPTC start header. However, upon closer inspection, it might be part of Photoshop-specific metadata (possibly 8BIM format) and not standard IPTC. IrfanView successfully reads the IPTC tags from the same files without issues, suggesting that the problem might lie in how the APP13 segment is parsed.

magick identify result:

Profiles: Profile-8bim: 50976 bytes Profile-exif: 7985 bytes Profile-icc: 456 bytes Profile-iptc: 50964 bytes Custom Field 19[1,218]: 0x00000000: 3015ffff ffff69ff ffff0dff ffffffff ff06ffff -0-ý-ľúičąň-Î-ĆÖę‹-Ń 0x00000014: ffff37ff ff7d3fff ff09ffff 4affffff 6116ff58 üŕő7żé}?§ţ ężJÍŔ¬a-“ 0x00000028: 6b6f6e43 0cff56ff ff6dffff ff371fff ffffffff XkonC--Výďm¶łé7-ÚßÓł 0x0000003c: 15ff70ff 00 ô-Ąp-

IrfanView screenshot as well as original image attached below.

Steps to Reproduce

magick identify -verbose iptc.jpeg

Images

iptc image

urban-warrior commented 5 months ago

Not sure what the issue is. ImageMagick extracts the IPTC profile but does not extract meta-data from the profile, instead preferring EXIF. You can view the IPTC profile with this command:

$ magick iptc_image.jpg iptcData.iptc
$ cat iptcData.iptc
x
John (Felix von Jascheroff) und Laura (Chryssanthi Kavazi, r.) sind �berrumpelt,
 als Lydia "Janani" (Michaela Hanser) �ber Weihnachten zu Besuch kommt.

+++ Die Verwendung des sendungsbezogenen Materials ist nur mit dem Hinweis und V
erlinkung auf RTL+ gestattet. +++tFoto: RTL / Rolf BaumgartnerP
Folge 7924<10254520231026(�Die Verwendung des Materials von RTL Deutschland ist 
nur zur redaktionellen Berichterstattung im Zusammenhang mit der Sendung unter A
ngabe der Credits/Quellenangabe und Beachtung der unter media.rtl.com genannten 
AGB erlaubt.Gute Zeiten, schlechte Zeiten
seruss commented 5 months ago

@urban-warrior I didn't mention that the problem originally occurred while using Magick.NET. When I used the command you suggested magick indeed saved proper iptc profile, however when calling C API GetImageProfile the returned data contains some (I suppose) photoshop specific information in addition to iptc values which could be causing problems for Magick.NET which is parsing iptc tags and values internally. I attach the result of API call. GetProfileFromApi.txt