ImageMagick / ImageMagick6

🧙‍♂️ ImageMagick 6
https://legacy.imagemagick.org
Other
199 stars 84 forks source link

Orientation of embedded PDF image detected as orientation of the PDF #340

Closed zerocrates closed 1 month ago

zerocrates commented 1 month ago

ImageMagick version

6.9.11-60

Operating system

Linux

Operating system, version and so on

Ubuntu 22.04

Description

ImageMagick appears to detect the orientation flag from an embedded image within a PDF and report it as the PDF's orientation. This comes out from identify output but is most relevant when using -auto-orient. As best I can tell, this isn't a case of the file having an incorrect orientation flag, but of ImageMagick incorrectly reading it.

The PDF I have consists of embedded JPEG images, each of which themselves have EXIF and XMP rotation flags: the images on pages 1 and 2 are "RightTop" oriented, while 3 and 4 are LeftBottom oriented. ImageMagick reports all pages of this PDF as having RightTop orientation, and with -auto-orient passed, it rotates the output 90 degrees clockwise, leading to incorrect output.

PDF viewers and renderers, including Ghostscript as the delegate here, have no problem with displaying this PDF as intended, and no tools (i.e. exiftool, pdfinfo) report the PDF itself as carrying an orientation flag, except ImageMagick. It looks like it might be improperly picking up the orientation data from one of the embedded images (the first one?) and treating that as "orientation" for the whole document.

My best guess is that it's the XMP that's where the problem lies: there's separate XMP XML for each image XObject, plus an XMP profile for the PDF as a whole, but ImageMagick reports this as the XMP profile for any page of the PDF, which is the one that's specifically for the first embedded image:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c018 91.98c2f96, 2021/06/15-20:39:32        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:tiff="http://ns.adobe.com/tiff/1.0/">
         <tiff:Orientation>6</tiff:Orientation>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>

The XMP profiles in the document vary page by page, with the last two having 8 for tiff:Orientation, but ImageMagick shows this same "orientation 6" profile for all pages. Regardless, the issue seems to be that any of these are being considered as the PDF's profile, as all of them are associated only with each specific image, and the orientation data they carry doesn't and shouldn't apply to the whole document.

At the end of the document is the PDF's XMP profile, which it would seem should be the one getting used and extracted here:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c018 91.98c2f96, 2021/06/15-20:39:32        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <xmp:ModifyDate>2024-08-30T11:15:53-07:00</xmp:ModifyDate>
         <xmp:CreateDate>2024-08-30T11:15:29-07:00</xmp:CreateDate>
         <xmp:MetadataDate>2024-08-30T11:15:53-07:00</xmp:MetadataDate>
         <xmp:CreatorTool>Adobe Acrobat 20.5</xmp:CreatorTool>
         <dc:format>application/pdf</dc:format>
         <xmpMM:DocumentID>uuid:4a48f21d-6061-47c3-9975-80ed2991fae3</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:4639e964-47ab-4966-8340-00ff7e2cff0b</xmpMM:InstanceID>
         <pdf:Producer>Adobe Acrobat 20.5 Image Conversion Plug-in</pdf:Producer>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

and this one of course has no orientation data.

Steps to Reproduce

display incorrect_rotation.pdf shows the PDF as expected in the correct orientation.

display -auto-orient incorrect_rotation.pdf shows the PDF improperly rotated 90 degrees clockwise.

(same results with convert for the above)

identify -format "%[orientation]\n" incorrect_rotation.pdf shows all 4 pages with "RightTop" orientation.

convert +ping "incorrect_rotation_bk.pdf[0]" XMP:- shows the offending profile (same for all pages, or repeated 4 times without the page specifier)

Images

incorrect_rotation.pdf

zerocrates commented 1 month ago

Side note: while I do think the root issue here is the misdetection of the profile, it might also be reasonable to say PDFs just always have Undefined orientation? I don't know that they ever really legitimately carry an orientation flag like this, with page orientation stuff instead being handled within the format.

dlemstra commented 1 month ago

I just pushed a patch to make sure that we read the last xmp profile inside the file and that seems to resolve your issue. It might still be a good idea to always have an undefined orientation for PDF but I am not sure if Ghostscript will always auto orient the output images.