gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

AcroForm flatten error when appearance directory is a stream #296

Closed elionne closed 4 months ago

elionne commented 5 months ago

I have an AcroForm Widget which defines its appearance by only a stream, not a dictionary. When I try to faltten, I got a strange error.

…/gems/hexapdf-0.39.1/lib/hexapdf/document.rb:734:in `block in write': Validation error for (147,2): Type of field Length is invalid: HexaPDF::Type::Form (HexaPDF::Error)

The acroForm Widget look like that:

6 0 obj
<< /V /Yes /DA (0 0 0 rg /F7 0 Tf) /DR <</Font 143 0 R >> /DV /Off /F 4 /FT /Btn /MK <</CA (8) >> /P 140 0 R /Rect [186.649 344.74 193.401 351.488 ] /StructParent 6 /Subtype /Widget /Type /Annot /AP <</N 147 2 R >> /AS /N >> 
endobj

147 2 obj
<</Length 0 /Subtype /Form /BBox [0 0 6.752 6.748 ] /Resources <</Font 143 0 R >> >> stream
endstream
endobj

It seems legal to have no dictionary for /N appearance. From my understanding of the spec, in that case it just defines the appearance of the current state.

There is a mismatch between /V /Yes and the /AS /N, but it should not be problematic as the flattened object should be 143 0 R.

I tried to fix it by using the /V field as on_form in create_check_box_appearances(), it doesn't crash, but doesn't create the appearance.

gettalong commented 5 months ago

I'm not sure whether having no dictionary for the /N appearance is a violation of the spec. I suspect that you are right. However, I personally would have expected no /AS entry in such a case.

Anyways, would you be able to provide the PDF in question? From the output you provide object 147 should have a /Length of 0. I'm not sure why HexaPDF thinks it has a Form XObject as the value of /Length.

elionne commented 5 months ago

I can reproduce by generating a pdf with the following script:

require 'hexapdf'

doc = HexaPDF::Document.new
page = doc.pages.add
canvas = page.canvas

canvas.font("Helvetica", size: 36)
canvas.text("Form Example", at: [50, 750])
form = doc.acro_form(create: true)
form.need_appearances!

canvas.font_size(16)
canvas.text("Check boxes", at: [50, 650])

cb = form.create_check_box("Checkbox")
widget = cb.create_widget(page, Rect: [200, 650, 210, 660])

doc.write("check_box.pdf")
gettalong commented 5 months ago

@elionne Thanks for the script and sorry for the late reply!

I can confirm the bug and will will investigate.

gettalong commented 5 months ago

So, from what I can see the behaviour of Okular is invalid.

Many viewers render the PDF file which was modified by Okular wrongly, it works on Adobe Reader though. All other viewers I tried correctly create an /AP subdictionary for /N containing /Yes and /Off keys.

If you remove the form.need_appearances! line from the script and re-run everything, you will find that everything works as Okular doesn't re-create the appearances. Note that using the #need_appearances! method is actually deprecated for PDF 2.0.

I will see how to make HexaPDF more robust in case of such - most probably invalid - PDFs.

gettalong commented 4 months ago

I have changed the appearance generator to check the /N key of the appearance dictionary if it is in a form useful for HexaPDF. If not, like in this case, the /N key is recreated with the available information.