ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
341 stars 52 forks source link

Adobe Acrobat form rendering issue #145

Open liammcdermott opened 4 months ago

liammcdermott commented 4 months ago

Related to #132

We have been using programmatic form filling successfully in production for a few months! Thank you for your work getting those PRs merged.

Recently, there have been a few complaints from users of Adobe Acrobat.

Symptoms

Programmatically filled form fields appear to not be filled-in image

Clicking a form field reveals the -- programmatically filled -- text, as expected: image

Then, making an edit to the text makes Acrobat retain the text once the field loses focus: image

Cause (?)

At first I thought about this: https://github.com/ajrcarey/pdfium-render/issues/132#issuecomment-1889831700 and that Acrobat might not conform with the PDF spec and regenerate the widget's appearance stream.

However, I have not been able to verify this yet. @ajrcarey can you reproduce this issue (I'm on Linux and am not sure Acrobat is working right through Wine)? Thanks!

liammcdermott commented 4 months ago

Some debugging information. If I save two copies of the PDF, from before and after editing a form field with Adobe PDF Reader (like above), vimdiff reports the only difference between them is this added at the end of the file,

fill-form-test-vimdiff.txt

Which I think is an appearance stream. This means Adobe PDF Reader's behaviour does not conform to the PDF spec, I'm not sure if filing a bug with Adobe might help. Never mind, in the PDF 2.0 spec appearance streams are required.

Just in case it's important: I edited the form in Adobe Reader DC (32-bit 2021 version), and I added a space to the vendor_name field.

Findings about PDF 2.0 requiring appearance streams:

  1. In the PDF 1.7 spec, it said that appearance streams must be generated by the viewer application, but that text has been removed from PDF 2.0 (see page 533, beneath Table 228 for where it should be).
  2. In section 12.7.3 of PDF 2.0, Table 224, the Value column for Key NeedAppearances has this note: 'Appearance streams are required in PDF 2.0 and later.'
  3. Section 12.5.2 of PDF 2.0, Table 166, the Value column for Key AP says:

AP dictionary (Optional; PDF 1.2) An appearance dictionary specifying how the annotation shall be presented visually on the page (see 12.5.5, "Appearance streams"). A PDF writer shall include an appearance dictionary when writing or updating the PDF file except for the two cases listed below.

Every annotation (including those whose Subtype value is Widget, as used for form fields), except for the two cases listed below, shall have at least one appearance dictionary.

• Annotations where the value of the Rect key consists of an array where the value at index 1 is equal to the value at index 3 and the value at index 2 is equal to the value at index 4. • Annotations whose Subtype value is Popup, Projection or Link.

liammcdermott commented 4 months ago

I checked the PDF version reported by my test PDF file, it's 1.7. I then tried lowering the file's PDF version to 1.6, then 1.2, but lowering the version did not trigger Acrobat to use the old behaviour.

So Acrobat uses the PDF 2.0 appearance stream behaviour, even if a PDF file reports its PDF version as 1.7 (or lower).

ajrcarey commented 4 months ago

Hi @liammcdermott , my apologies for the delay in replying. Yes, I am happy to help you with this and I do have access to a Windows machine around here for some testing with Adobe Acrobat outside of Wine. Do you have some test files that exhibit the behaviour that you can share?

liammcdermott commented 4 months ago

No problem @ajrcarey, and thanks for getting back to me!

I was able to reproduce this issue using the test form and form filling example in this repo (so you can use that to get a test file). Also, attached is a PDF with a form filled by pdfium-render that exhibits the problem.

123_test_street_freehold_firm.pdf

liammcdermott commented 4 months ago

Further notes:

  1. In 8.6.1 of PDF spec 1.7 is NeedAppearances, A flag specifying whether to construct appearance streams and appearance dictionaries for all widget annotations in the document which might be useful
  2. NeedAppearances is deprecated in PDF 2.0 (as mentioned above)
  3. Currently, I'm working on adding a function to pdfium called FPDFAnnot_GenerateAP(FPDF_ANNOTATION annot), which will generate the appearance stream for an annotation. We'll probably need to also set NeedAppearances, since even if my code works, I don't know how long getting a patch into upstream will take (and if they will accept it).
ajrcarey commented 3 months ago

Hi @liammcdermott , again, my apologies for the delay. I'm gradually getting back on top of things now. I can reproduce the problem with your test document using Adobe Reader on Windows. (There is an adobe-reader-11 package for arch, but I couldn't make it work, so I haven't been able to reproduce the problem under Linux. But my guess would be the behaviour would be the same.)

Your idea of adding a new function to Pdfium itself is good, because that's the right place for the functionality, but like you I wonder how long it will take to get merged. I'm open to the idea of adding some appearance stream generation functionality directly into pdfium-render, although the disadvantage there (other than bloat) is that pdfium-render doesn't have access to all the properties of every page object, so it's possible to come up with examples where an appearance stream generated by pdfium-render would be slightly off. But it might offer some sort of stop-gap solution in the short term.

The existing FPDFAnnot_SetAP() function could be used to set such an appearance stream. I'm not sure, however, how to set the NeedAppearances flag you mention.

liammcdermott commented 3 months ago

image

YESSSSSSS! FUUUUUUUUUU- YES!!!!!!

ahem

After weeks of fiddling around, I have successfully patched pdfium with a public function that rebuilds AP streams for annotations. Also, the above PDF was filled using pdfium-render, so we know it'll work through the FFI.

I'm open to the idea of adding some appearance stream generation functionality directly into pdfium-render, although the disadvantage there (other than bloat) is that pdfium-render doesn't have access to all the properties of every page object, so it's possible to come up with examples where an appearance stream generated by pdfium-render would be slightly off. But it might offer some sort of stop-gap solution in the short term.

Totally valid idea, and agreed about the disadvantages. Wouldn't it also be an awful lot of work?

Could we fork pdfium-binaries and add my patch (until it gets merged into upstream)? There is a step in their build process for patching pdfium already (see build.sh), so the fork will be small. What do you reckon, @ajrcarey ?

Also, no worries on the delays. You ain't being paid for reading my issues, so I'm grateful for whatever attention you can give these :grin:

ajrcarey commented 3 months ago

Excellent work, well done!

Wouldn't it also be an awful lot of work?

Yes, and for results that aren't guaranteed to be 100% accurate. It's not a good solution. Your approach is better.

Could we fork pdfium-binaries and add my patch (until it gets merged into upstream)? There is a step in their build process for patching pdfium already (see build.sh), so the fork will be small.

My suggestion is for you to make a custom build of pdfium using your patch, then use pdfium-render's static linking feature to bind to it at compile time. We then make a change to pdfium-render, probably gated behind a non-default crate feature flag, to enable a binding to your custom pdfium functionality.

Once your patch to pdfium is accepted upstream, we remove the feature flag in pdfium-render and make the functionality available to all by default.

How does that sound?

liammcdermott commented 3 months ago

That makes a lot of sense, better than telling users to download an obscure patched version of pdfium.

I'll get working on sending a patch upstream, as a first step. Thanks for the guidance! I've never been so happy to see a PDF form fill.

liammcdermott commented 3 months ago

Progress update

I created an upstream issue which includes a patch for the API function we need. I forked pdfium binaries and successfully made it patch pdfium with the new API function.

However, on my fork of pdfium-binaries I'm running into a snag with Github's Actions. Specifically when they're trying to create a release. It looks like a permissions issue, but I don't have much experience with Github Actions, so if anyone knows the cause of this failure, please let me know!

Full output: image

liammcdermott commented 1 month ago

Just an update from the upstream issue: looks like the pdfium maintainers should accept a patch adding a new experimental API function for regenerating appearance streams.

ajrcarey commented 1 month ago

Good work. We can gate it behind a feature flag once it's available.