gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

Color differences when flattening an acro_form #257

Closed brettwgreen closed 1 year ago

brettwgreen commented 1 year ago

Seeing bad color output when flattening an acro_form which I'm sure is mostly due to the construction of the original form which is out of my control.

Just curious why this happens and if there is any way for me to modify the output color after flattening an acro_form.

You helped me with this form a few weeks back... a not particularly well architected govt form: https://www.jag.navy.mil/documents/5776/CLJA_Claims_Form.pdf

I am able to fill most of the fields on this form but, after I flatten it, some of the fields come out with black text while others are light gray. I suspect that's due to how the fields are designed in the first place. Maybe it's because of gray placeholder text added to some of the fields?

Code is pretty simple:

  doc = HexaPDF::Document.open(path)
  field = doc.acro_form.field_by_name('Claimant_First_Name')
  field.field_value = 'Johnny'
  doc.acro_form.flatten
  doc.write('flattened.pdf', validate: true)

Note: I'm also looking to connect a commercial license... sent a separate inquiry email with questions.

gettalong commented 1 year ago

Thanks for the issue - I can reproduce the problem.

The Problem

So, looking at the field 'Claimant_First_Name' we find that it is a rich text field. HexaPDF doesn't support rich text fields as rich text fields but only as simple text fields. This means it uses the values provided for text fields and not the values provided for rich text fields when generating an appearance stream for the field.

The default styling of that field according to its information for text fields is /Helv 8 Tf 0.75 g, so using font Helvetica at 8 point size and font color of light grey (0 would be black, 1 one be white). Contrast that to its default styling information for rich text fields style="font-size:8.0pt;text-align:left;color:#000000;font-weight:normal;font-style:normal;font-family:Helvetica,sans-serif;font-stretch:normal" which also say Helvetica at 8 point size but with a color of black.

So there is a dissonance between those two information sets and since HexaPDF uses the one for the simple text field, the color comes out grey.

I also looked at the field 'Claimant_Last_Name' which has the same value for its default styling of rich text fields but the value /Helv 8 Tf 0 g for its default styling of text field - which says that the font color should be black.

So while the information for the last name matches up, it doesn't for the first name which is probably an oversight on the part of the person creating the document.

A Solution

You would need to make a special exception for this PDF document and process it differently by adjusting the default appearance string before filling out the form. This way the appearance generator of HexaPDF uses the modified default appearance string when generating the appearance.

Changing the font color after generating the form field appearances (or after flattening) would also be possible but is harder to accomplish.

Here is your code modified to change the font color to black for the 'Claimant_First_Name' field:

require 'hexapdf'

doc = HexaPDF::Document.open(ARGV[0])
field = doc.acro_form.field_by_name('Claimant_First_Name')
field[:DA] = '/Helv 8 Tf 0 g'   # <---  change the default appearance string to use color black
field.field_value = 'Johnny'
doc.acro_form.flatten
doc.write('/tmp/flattened.pdf', validate: true)

Also note that the form makes use of Javascript actions to change the values of certain fields, like the File_Name field which is at the top of the page. HexaPDF doesn't include a Javascript execution engine, so fields affected by Javascript actions won't be updated.

brettwgreen commented 1 year ago

Thank you for the detailed response!

I had noticed the Javascript embedding in there also... you have my gratitude for investing so much time in understanding the hellscape that is PDF 😄