Open liammcdermott opened 10 months ago
The following is some debugging information, in case it's helpful.
# Text form field with text added using Evince prior to loading the PDF file, after pdf_fill_test():
field name `outside_closing_txt`
annotation > as_form_field() > as_text_field() >
appearance_mode_value(PdfAppearanceMode::Normal): Some("/Tx BMC\nq\nBT\n/courier-bold 10.0 Tf 0 g 1 0 0 1 3.00 3.50 Tm \n(TESTY MCTEST) Tj\nET\nQ\nEMC\n")
appearance_stream(): Some("N")
# Text form field that is empty in the PDF file, after pdf_fill_test():
field name `termination_period_txt`
annotation > as_form_field() > as_text_field() >
appearance_mode_value(PdfAppearanceMode::Normal): Some("/Tx BMC\nq\n2 14 m\n172 14 l\n172 1 l\n2 1 l\n2 14 l\nh\nW\nn\nQ\nEMC\n")
appearance_stream(): None
Calling annotation.objects().iter()
on outside_closing_txt
's annotation, and getting the objects collection's length and type of each object yielded this:
Objects len: 1
Obj: Text
However, for termination_period
it yielded this:
Objects len: 0
Hi @liammcdermott , thank you for reporting the issue. I am happy to help you with this. The appearance streams for the form fields likely need to be created manually - assuming Pdfium offers a sufficient interface to do so, I need to check that - and the work could potentially be related to #89, which also touches appearance streams.
Great to hear you'd like to help with this, thanks @ajrcarey!
Regarding whether Pdfium supports appearance streams for form fields: I'm not sure I'm looking in the right place, but there is CPDF_GenerateAP::GenerateFormAP(), is that the interface we're looking for?
That interface is private, unfortunately. We're limited to the public FPDF_* functions. The FPDFAnnot_SetAP()
function looks like a promising place to start, although the docs are light on details: https://pdfium.googlesource.com/pdfium/+/refs/heads/main/public/fpdf_annot.h#617
(That said, the GenerateFormAP()
function you linked to may give some hints as to how to generate the appearance stream code programmatically, particularly the GenerateEditAP()
and GenerateColorAP()
functions.)
I followed the usages of that private GenerateFormAP()
function, up until I reached a public interface, and found FPDFPage_TransformAnnots()
code link.
Maybe we could:
FPDFAnnot_SetAP()
FPDFPage_TransformAnnots(page, PDFMatrix::IDENTITY)
to trigger a rebuild of appearance streams for annotations on the page.Side note: (1) is necessary, since (2) only generates appearance streams for annotations that don't already have them.
This solution relies on Pdfium's internal implementation recreating the appearance streams when they don't exist, however, I'm sanguine about that, since page 678 of the PDF spec says:
If the widget annotation has no appearance dictionary, the viewer application must create one and store it in the annotation dictionary’s AP entry.
The spec is pretty clear, Pdfium must regenerate the appearance stream if it finds there isn't one.
Then we could send a patch upstream, adding a function like FPDFPage_GenerateContent()
but for annotations (FPDFPage_GenerateAnnotations()
?). Then we won't be relying on implicitly defined behaviour.
What do you think?
Huh, I just did an experiment, adding this to my code above, just before the call to FPDFPage_CloseAnnot()
and it works:
b.FPDFAnnot_SetAP(annotation_handle, PdfAppearanceMode::Normal as i32, null());
By 'it works' I mean, when I open the filled PDF in Chrome or Evince, the form fields are filled out. So, thanks to you pointing out FPDFAnnot_SetAp()
I might actually meet this work deadline! Thank you so much @ajrcarey!
(I still want to make a PR for this)
Superb work. Did you need to use FPDFPage_TransformAnnots()
in the end?
To answer your earlier question: yes, mutation of form field values would be a valuable addition to pdfium-render
, presumably by adding some functionality to the PdfPageAnnotationCommon
and PdfPageAnnotationPrivate
traits. I'm assuming it's the call to FPDFAnnot_SetStringValue_str()
with a key of "V" in your sample that actually sets the form field value?
I'm curious as to whether the FORM_*()
function calls are actually necessary, or if your successful experiment still works without them. If you wanted to submit a PR to add bindings for the new FORM_*()
functions, that'd be swell. I am happy to work on the trait implementations, unless you especially wanted to; but given you already have a solution using raw FPDF_*
functions, I certainly wouldn't expect you to rewrite it at this point.
Superb work. Did you need to use
FPDFPage_TransformAnnots()
in the end?
Nope! I just needed to add that one line. It's something of a hack, since AFAICT it leads to the annotations having no appearance streams in the resulting PDF. That forces clients (like Chrome and Evince) to generate the appearance streams themselves.
I'm thinking a better implementation would be triggering Pdfium to regenerate the appearance streams.
To answer your earlier question: yes, mutation of form field values would be a valuable addition to
pdfium-render
, presumably by adding some functionality to thePdfPageAnnotationCommon
andPdfPageAnnotationPrivate
traits. I'm assuming it's the call toFPDFAnnot_SetStringValue_str()
with a key of "V" in your sample that actually sets the form field value?
That's great to hear. Those are the traits I was looking at and taking ideas from, so yes, I'm assuming the functionality should be added to them. FPDFAnnot_SetStringValue_str()
with a key of "V" is indeed what sets the value of the form fields.
I'm curious as to whether the
FORM_*()
function calls are actually necessary, or if your successful experiment still works without them. If you wanted to submit a PR to add bindings for the newFORM_*()
functions, that'd be swell. I am happy to work on the trait implementations, unless you especially wanted to; but given you already have a solution using rawFPDF_*
functions, I certainly wouldn't expect you to rewrite it at this point.
I suspect the FORM_*()
calls aren't strictly necessary, I just noticed them in Pdfium's sample code: https://pdfium.googlesource.com/pdfium/+/refs/heads/main/samples/simple_no_v8.c and https://pdfium.googlesource.com/pdfium/+/refs/heads/main/samples/pdfium_test.cc#1633
Although, I'm really not sure what's going on in much of that code.
If you don't mind, I'd like to submit PRs for both bindings and trait updates. The only reason I used the bindings directly was to avoid forking pdfium-render (eventually I failed even at that!), and while working on the problem I got a good feel for how mutation of form field values could be added to the traits.
If I can't get both done by the end of next week, I'll pass it back to you, does that sound okay? Let me know if you have any particular requirements for the implementation of this (beyond the usual, like code style and whatnot).
Many thanks. By all means, take as much time as you like. I'm not in any hurry and could not realistically start work on this for a couple of weeks anyway.
Well I said 'by the end of next week', but work had other plans. My hope is to work on it this weekend.
BTW: I did meet the deadline, and the form filling is working well in testing, so I'm hoping this will be a good addition. However, there is one major caveat: my itch was text fields, and that's what I've scratched. I'd love to implement filling check boxes and so forth, but that will have to come later (AFAICT it's not that different from filling text fields).
Great that you met your deadline and text boxes only is perfectly fine, I'm happy to implement the rest based on your template.
Hi @liammcdermott , any updates on this?
@ajrcarey I have made a start, and have blocked off some more time tomorrow to work on this. I'll let you know how I get on!
Made good progress on this today. I should have a PR for you sometime tomorrow @ajrcarey
Merged pull request. Made some small adjustments to doc comments and imports. Updated README. Added new examples/fill_form_field.rs
example. Began work on applying same basic approach to filling checkbox and radio button form fields.
I know there are usually better solutions than pdfium-render for this (pdftk for example). However, a convoluted set of circumstances have led me to attempt filling PDF forms using pdfium-render.
Specifically, I have a PDF file that includes a form with text fields, and I am attempting to fill those text fields with values programmatically, then save a copy of the PDF with those fields filled.
Here is what I have so far:
Right now this code just fills the value of every widget field with 'TEST', but it does fill the values of the form widgets successfully. However, when opening the PDF, programmatically set values do not appear on the page until the user sets the focus on the form field containing that value. Meaning, if the form field is a text box, the user has to set the focus inside it before the value will appear.
I did some debugging, and found that while this code successfully updates form field values in the
obj
/Type /Annot
, when I checked the corresponding appearance stream, the text and related text drawing commands were absent.At that point I realised it's time to stop trying to figure this out myself, and get some help. Here are my questions:
FORM_OnBeforeClosePage()
for example); so with that in mind, would you be interested in a PR?Obviously, any code I submitted in a PR would be less messy than the example above! Thank you for your consideration.