gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

Annotations are lost instead of flattened on Ghostscript-generated PDF #258

Closed justinlara closed 12 months ago

justinlara commented 12 months ago

I am working on an app that ingests and manipulates documents as PDF. We currently run all files through Ghostscript at the point of upload, as we have found this helps remove rough edges and results in files with a more consistent quality. Afterward, we are now trying to flatten any annotations into the file content stream. However, we have found that while the Ghostscript processing retains annotations, using HexaPDF's flattening on the output file removes those annotations entirely rather than flattening them into the document.

This can be replicated with the following CLI commands, using the attached PDF document (or any PDF with annotations) as the initial input:

gs -dNOPAUSE -dSAFER -dBATCH -sDEVICE=pdfwrite -sOutputFile=<OUTPUT FILEPATH> <INPUT FILEPATH>
hexapdf modify --annotations flatten <INPUT FILEPATH> <OUTPUT FILEPATH>

Ghostscript version: 9.56.1 HexaPDF version: 0.32.0

Any guidance would be appreciated, thank you!

annotated.pdf

gettalong commented 12 months ago

Thanks for opening the issue and providing the PDF as well as the reproduction script.

When using HexaPDF on the original file, it works as expected. However, I can confirm that it doesn't work on the one modified via Ghostscript.

A quick investigation shows that Ghostscript modifies the appearance streams of the annotations in a peculiar way.

gettalong commented 12 months ago

I found the problem: Ghostscript modifies the annotation's appearances so that they are scaled by a factor of 10. HexaPDF does take this into account when flattening annotations but the problem is that it scales back around the origin and not the bottom-left corner of the appearance's bounding box.

gettalong commented 12 months ago

@justinlara I have fixed the problem and not flattening works for both the original file and the file modified by Ghostscript.

justinlara commented 12 months ago

Interesting, I'm surprised that Ghostscript makes such a drastic change to annotations. Thank you very much for looking into this!

gettalong commented 12 months ago

Yeah, the change done by Ghostscript basically amounts to scaling all coordinates by a factor of 10, i.e. 59.5 -> 595 :man_shrugging: Which means they also parsed and changed the content stream of the annotation's appearance.