Open liammcdermott opened 6 months ago
Some debugging information. If I save two copies of the PDF, from before and after editing a form field with Adobe PDF Reader (like above), vimdiff
reports the only difference between them is this added at the end of the file,
Which I think is an appearance stream. This means Adobe PDF Reader's behaviour does not conform to the PDF spec, I'm not sure if filing a bug with Adobe might help. Never mind, in the PDF 2.0 spec appearance streams are required.
Just in case it's important: I edited the form in Adobe Reader DC (32-bit 2021 version), and I added a space to the vendor_name
field.
Findings about PDF 2.0 requiring appearance streams:
Value
column for Key
NeedAppearances
has this note: 'Appearance streams are required in PDF 2.0 and later.'Value
column for Key
AP
says:AP dictionary (
Optional; PDF 1.2) An appearance dictionary specifying how the annotation shall be presented visually on the page (see 12.5.5, "Appearance streams"). A PDF writer shall include an appearance dictionary when writing or updating the PDF file except for the two cases listed below.Every annotation (including those whose Subtype value is Widget, as used for form fields), except for the two cases listed below, shall have at least one appearance dictionary.
• Annotations where the value of the Rect key consists of an array where the value at index 1 is equal to the value at index 3 and the value at index 2 is equal to the value at index 4. • Annotations whose Subtype value is Popup, Projection or Link.
I checked the PDF version reported by my test PDF file, it's 1.7. I then tried lowering the file's PDF version to 1.6, then 1.2, but lowering the version did not trigger Acrobat to use the old behaviour.
So Acrobat uses the PDF 2.0 appearance stream behaviour, even if a PDF file reports its PDF version as 1.7 (or lower).
Hi @liammcdermott , my apologies for the delay in replying. Yes, I am happy to help you with this and I do have access to a Windows machine around here for some testing with Adobe Acrobat outside of Wine. Do you have some test files that exhibit the behaviour that you can share?
No problem @ajrcarey, and thanks for getting back to me!
I was able to reproduce this issue using the test form and form filling example in this repo (so you can use that to get a test file). Also, attached is a PDF with a form filled by pdfium-render that exhibits the problem.
Further notes:
NeedAppearances
, A flag specifying whether to construct appearance streams
and appearance dictionaries for all widget annotations in the document which might be usefulNeedAppearances
is deprecated in PDF 2.0 (as mentioned above)FPDFAnnot_GenerateAP(FPDF_ANNOTATION annot)
, which will generate the appearance stream for an annotation. We'll probably need to also set NeedAppearances
, since even if my code works, I don't know how long getting a patch into upstream will take (and if they will accept it).Hi @liammcdermott , again, my apologies for the delay. I'm gradually getting back on top of things now. I can reproduce the problem with your test document using Adobe Reader on Windows. (There is an adobe-reader-11 package for arch, but I couldn't make it work, so I haven't been able to reproduce the problem under Linux. But my guess would be the behaviour would be the same.)
Your idea of adding a new function to Pdfium itself is good, because that's the right place for the functionality, but like you I wonder how long it will take to get merged. I'm open to the idea of adding some appearance stream generation functionality directly into pdfium-render, although the disadvantage there (other than bloat) is that pdfium-render doesn't have access to all the properties of every page object, so it's possible to come up with examples where an appearance stream generated by pdfium-render would be slightly off. But it might offer some sort of stop-gap solution in the short term.
The existing FPDFAnnot_SetAP()
function could be used to set such an appearance stream. I'm not sure, however, how to set the NeedAppearances
flag you mention.
YESSSSSSS! FUUUUUUUUUU- YES!!!!!!
ahem
After weeks of fiddling around, I have successfully patched pdfium with a public function that rebuilds AP streams for annotations. Also, the above PDF was filled using pdfium-render, so we know it'll work through the FFI.
I'm open to the idea of adding some appearance stream generation functionality directly into pdfium-render, although the disadvantage there (other than bloat) is that pdfium-render doesn't have access to all the properties of every page object, so it's possible to come up with examples where an appearance stream generated by pdfium-render would be slightly off. But it might offer some sort of stop-gap solution in the short term.
Totally valid idea, and agreed about the disadvantages. Wouldn't it also be an awful lot of work?
Could we fork pdfium-binaries and add my patch (until it gets merged into upstream)? There is a step in their build process for patching pdfium already (see build.sh
), so the fork will be small. What do you reckon, @ajrcarey ?
Also, no worries on the delays. You ain't being paid for reading my issues, so I'm grateful for whatever attention you can give these :grin:
Excellent work, well done!
Wouldn't it also be an awful lot of work?
Yes, and for results that aren't guaranteed to be 100% accurate. It's not a good solution. Your approach is better.
Could we fork pdfium-binaries and add my patch (until it gets merged into upstream)? There is a step in their build process for patching pdfium already (see build.sh), so the fork will be small.
My suggestion is for you to make a custom build of pdfium using your patch, then use pdfium-render
's static linking feature to bind to it at compile time. We then make a change to pdfium-render
, probably gated behind a non-default crate feature flag, to enable a binding to your custom pdfium functionality.
Once your patch to pdfium is accepted upstream, we remove the feature flag in pdfium-render
and make the functionality available to all by default.
How does that sound?
That makes a lot of sense, better than telling users to download an obscure patched version of pdfium.
I'll get working on sending a patch upstream, as a first step. Thanks for the guidance! I've never been so happy to see a PDF form fill.
Progress update
I created an upstream issue which includes a patch for the API function we need. I forked pdfium binaries and successfully made it patch pdfium with the new API function.
However, on my fork of pdfium-binaries I'm running into a snag with Github's Actions. Specifically when they're trying to create a release. It looks like a permissions issue, but I don't have much experience with Github Actions, so if anyone knows the cause of this failure, please let me know!
Full output:
Just an update from the upstream issue: looks like the pdfium maintainers should accept a patch adding a new experimental API function for regenerating appearance streams.
Good work. We can gate it behind a feature flag once it's available.
Related to #132
We have been using programmatic form filling successfully in production for a few months! Thank you for your work getting those PRs merged.
Recently, there have been a few complaints from users of Adobe Acrobat.
Symptoms
Programmatically filled form fields appear to not be filled-in
Clicking a form field reveals the -- programmatically filled -- text, as expected:
Then, making an edit to the text makes Acrobat retain the text once the field loses focus:
Cause (?)
At first I thought about this: https://github.com/ajrcarey/pdfium-render/issues/132#issuecomment-1889831700 and that Acrobat might not conform with the PDF spec and regenerate the widget's appearance stream.
However, I have not been able to verify this yet. @ajrcarey can you reproduce this issue (I'm on Linux and am not sure Acrobat is working right through Wine)? Thanks!