Closed brettwgreen closed 1 year ago
Could you provide the PDF in question so that I can have a look at it?
ChoiceField#option_items
should only return an array of the display strings, i.e. ['AL', 'AK', 'AZ' ...]
. I'm not sure why it returns an array of arrays. Since this method is also used when setting a field value using #field_value=
, it errors out.
Thanks for the file!
I have tried the following script and it works:
require 'hexapdf'
doc = HexaPDF::Document.open(ARGV[0])
field = doc.acro_form.field_by_name('Claimant_State')
p field.option_items
field.field_value = 'OR'
doc.write('/tmp/out.pdf', incremental: true, optimize: true)
The output:
["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY", "CZ-Canal Zone", "GU-Guam", "VI-Virgin Islands", " "]
So I'm not sure what is happening in your case. Could you share some code that produces the problem?
Hmmm... that's odd.
I'm on a Mac? Would that matter? Also I have a full Adobe Acrobat install. Does that have an impact or the presence of other command line tools like pdftk or pdfinfo?
I think my code more or less looks exactly like what you have, but I'll take another look tomorrow.
Wrapped your code in a little function and ran it exactly:
require 'hexapdf'
def test_pdf(path)
doc = HexaPDF::Document.open(path)
field = doc.acro_form.field_by_name('Claimant_State')
p field.option_items
field.field_value = 'OR'
doc.write('out.pdf', incremental: true, optimize: true)
end
Then, in irb:
2.7.7 :018 > test_pdf(path)
[["Alabama", "AL"], ["Alaska", "AK"], ["Arizona", "AZ"], ["Arkansas", "AR"], ["California", "CA"], ["Colorado", "CO"], ["Connecticut", "CT"], ["Delaware", "DE"], ["District of Columbia", "DC"], ["Florida", "FL"], ["Georgia", "GA"], ["Idaho", "HI"], ["Illinois", "ID"], ["Illinois", "IL"], ["Iowa", "IN"], ["Iowa", "IA"], ["Kansas", "KS"], ["Kentucky", "KY"], ["Louisiana", "LA"], ["Maine", "ME"], ["Maryland", "MD"], ["Massachusetts", "MA"], ["Michigan", "MI"], ["Minnesota", "MN"], ["Mississippi", "MS"], ["Missouri", "MO"], ["Montana", "MT"], ["Nebraska", "NE"], ["Nevada", "NV"], ["New Hampshire", "NH"], ["New Jersey", "NJ"], ["New Mexico", "NM"], ["New York", "NY"], ["North Carolina", "NC"], ["North Dakota", "ND"], ["Ohio", "OH"], ["Oklahoma", "OK"], ["Oregon", "OR"], ["Rhode Island", "PA"], ["Rhode Island", "RI"], ["South Carolina", "SC"], ["South Dakota", "SD"], ["Tennessee", "TN"], ["Texas", "TX"], ["Utah", "UT"], ["Vermont", "VT"], ["Virginia", "VA"], ["Washington", "WA"], ["West Virginia", "WV"], ["Wisconsin", "WI"], ["Wyoming", "WY"], ["Guam", "CZ-Canal Zone"], ["Guam", "GU-Guam"], ["Virgin Islands", "VI-Virgin Islands"], " "]
Traceback (most recent call last):
7: from /Users/brett/.rvm/rubies/ruby-2.7.7/bin/irb:23:in `<main>'
6: from /Users/brett/.rvm/rubies/ruby-2.7.7/bin/irb:23:in `load'
5: from /Users/brett/.rvm/rubies/ruby-2.7.7/lib/ruby/gems/2.7.0/gems/irb-1.2.6/exe/irb:11:in `<top (required)>'
4: from (irb):18
3: from (irb):15:in `test_pdf'
2: from /Users/brett/.rvm/gems/ruby-2.7.7/gems/hexapdf-0.12.3/lib/hexapdf/type/acro_form/choice_field.rb:134:in `field_value='
1: from /Users/brett/.rvm/gems/ruby-2.7.7/gems/hexapdf-0.12.3/lib/hexapdf/configuration.rb:353:in `block in <module:HexaPDF>'
HexaPDF::Error (Invalid value "OR" for combo_box field Claimant_State)
Is this happening in the intial parse of the PDF? Or when I call doc.acro_form? Having a hard time navigating those parts of the code to see why it's being parsed different on my side. I thought I could just patch up option_items
to allow for an array of arrays, but the problem seems to be further upstream.
Update: On a lark, tried ruby 3.2... same result.
Alright... turns out my require 'hexapdf'
was using an older version. Did gem install to get latest globally and now I only see the 'display' items in the array, although now I get a different error trying to write out the file
ruby-3.2.2/gems/hexapdf-0.32.2/lib/hexapdf/document.rb:677:in `block in write': Validation error for (14,0): Invalid size for /U, /O, /UE, /OE or /Perms values for revisions 6 (HexaPDF::Error)
Probably just another issue entirely, so we can probably close this issue.
Update: I was able to get around that by using doc.write('out.pdf', validate: false)
... was able to set the state value and looks good in the PDF opened in Acrobat. Some issue there with encryption but almost certainly unrelated.
Update 2: The details of that error looking at debugging of that encryption issue were that value[:U].length and value[:O].length are both 127 with this file. Seems to be a hard validation requirement that these are 48 in the validation code for encryption.
Thanks for your investigation!
Yes, when using AES 256bit encryption the /O and /U entries need to be 48 bytes long (see table 21 in section 7.6.4.2 of the PDF 2.0 spec) while in the file they are longer. I don't know why they are 127 bytes long. However, there is no real information in these superfluous bytes because they are all zero.
So I think it would be possible to do auto-correction by just truncating the /O and /U fields to their correct size iff the invalid bytes are only zeros.
Thanks so much for your help. I will close the issue.
@brettwgreen FYI I have implemented the auto-correction for the /O and /U fields, will be available with the next version.
I have an AcroForm PDF with a List Box that has values as an array of arrays... for example, state List Box has
No matter how I try and set the value for this, I get a validation error:
This seems like a perfectly valid way of defining an AcroForm field... this is a government document I'm working with. When setup this way, the first value in the array of arrays is the stored value, while the second entry is the display value.
Even if I try and bypass the validation and try
field.value[:V] = 'Alabama'
, it does not seem to save it when I write the file to disk.It just seems that the code around
field_value=
in the ChoiceBox cannot handle an array of arrays unless I'm missing something. I'm happy to do a Pull Request if you can confirm or I'm just missing something.