gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

AcroForm fields position breaks after filling with HexaPDF #249

Closed ekzobrain closed 1 year ago

ekzobrain commented 1 year ago

After filling AcroForm fields with HexaPDF fields move from their initial positions in any viewer. Reproduction script:

require 'hexapdf'

path = 'test font.pdf'
path_dst = 'test font result 1.pdf'

pdf = HexaPDF::Document.open(path)

date1 = pdf.acro_form.field_by_name('Date1')
# date1 before -> #<HexaPDF::Type::AcroForm::TextField [31, 0] value={:DA=>"/CourierNew 18 Tf 0 g", :F=>4, :FT=>:Tx, :Ff=>29360128, :MK=>{}, :MaxLen=>10, :P=>#<HexaPDF::Reference [17, 0]>, :Rect=>#<HexaPDF::Reference [42, 0]>, :Subtype=>:Widget, :T=>"Date1", :Type=>:Annot}>
date1.field_value = '12.22.2222'
# date1 after -> #<HexaPDF::Type::AcroForm::TextField [31, 0] value={:DA=>"/CourierNew 18 Tf 0 g", :F=>4, :FT=>:Tx, :Ff=>29360128, :MK=>#<HexaPDF::Type::Annotations::Widget::AppearanceCharacteristics [0, 0] value={:R=>0}>, :MaxLen=>10, :P=>#<HexaPDF::Reference [17, 0]>, :Rect=>#<HexaPDF::Rectangle [42, 0] value=[#<HexaPDF::Object [43, 0] value=154.862>, #<HexaPDF::Object [44, 0] value=713.999>, #<HexaPDF::Object [45, 0] value=296.462>, #<HexaPDF::Object [46, 0] value=733.643>]>, :Subtype=>:Widget, :T=>"Date1", :Type=>:Annot, :V=>"22.22.2222", :AS=>:N, :AP=>{:N=>#<HexaPDF::Type::Form [51, 0] value={:Type=>:XObject, :Subtype=>:Form, :BBox=>#<HexaPDF::Rectangle [0, 0] value=[0, 0, 141.6, 19.644000000000005]>, :Matrix=>nil, :Resources=>#<HexaPDF::Type::Resources [0, 0] value={:Encoding=>{:PDFDocEncoding=>#<HexaPDF::Reference [12, 0]>}, :Font=>#<HexaPDF::Dictionary [0, 0] value={:CourierNew=>#<HexaPDF::Type::FontTrueType [33, 0] value={:BaseFont=>:CourierNew, :Encoding=>#<HexaPDF::Reference [34, 0]>, :FirstChar=>0, :FontDescriptor=>#<HexaPDF::Type::FontDescriptor [35, 0] value={:Ascent=>1021, :CapHeight=>571, :Descent=>-680, :Flags=>34, :FontBBox=>[-122, -680, 623, 1021], :FontFamily=>"Courier New", :FontFile2=>#<HexaPDF::Stream [19, 0] value={:Filter=>:FlateDecode, :Length=>437695, :Length1=>684624}>, :FontName=>:CourierNew, :FontStretch=>:Normal, :FontWeight=>400, :ItalicAngle=>0, :StemV=>40, :Type=>:FontDescriptor, :XHeight=>423}>, :LastChar=>255, :Name=>:CourierNew, :Subtype=>:TrueType, :Type=>:Font, :Widths}>, :Helv=>#<HexaPDF::Type::FontType1 [10, 0] value={:BaseFont=>:Helvetica, :Encoding=>#<HexaPDF::Reference [12, 0]>, :Name=>:Helv, :Subtype=>:Type1, :Type=>:Font}>, :ZaDb=>#<HexaPDF::Type::FontType1 [11, 0] value={:BaseFont=>:ZapfDingbats, :Name=>:ZaDb, :Subtype=>:Type1, :Type=>:Font}>, :F4=>#<HexaPDF::Type::FontType0 [50, 0] value={:Type=>:Font, :Subtype=>:Type0, :BaseFont=>:CourierNewPSMT, :Encoding=>:"Identity-H", :DescendantFonts=>[#<HexaPDF::Type::CIDFont [49, 0] value={:Type=>:Font, :Subtype=>:CIDFontType2, :BaseFont=>:CourierNewPSMT, :FontDescriptor=>#<HexaPDF::Type::FontDescriptor [48, 0] value={:Type=>:FontDescriptor, :FontName=>:CourierNewPSMT, :FontWeight=>400, :Flags=>5, :FontBBox=>[-121.58203125, -679.6875, 622.55859375, 1020.99609375], :ItalicAngle=>0.0, :Ascent=>612.79296875, :Descent=>-188.4765625, :StemV=>80, :CapHeight=>571.2890625, :XHeight=>422.8515625}>, :CIDSystemInfo=>{:Registry=>"Adobe", :Ordering=>"Identity", :Supplement=>0}, :CIDToGIDMap=>:Identity}>]}>}>}>, :Filter=>:FlateDecode}>}, :Q=>0}>

pdf.acro_form.field_by_name('Date2').field_value = '12.22.2222'
pdf.acro_form.field_by_name('Date3').field_value = '12.22.2222'

pdf.write(path_dst)

test font.pdf test font result 1.pdf

If we then open test font result 1.pdf with Adobe Acrobat, change font in field Date1 to another one, then return back original font and save document - field position becomes correct again.

gettalong commented 1 year ago

This is what I see:

test.font.pdf: image test.font.result.1.pdf: image

What exactly has moved? Could you provide sample images?

ekzobrain commented 1 year ago

I think I did not express myself correctly. It seems like text baseline (or other text/font parameters) changes (if there is such concept in PDF forms) and text inside field moves a little down and left. This image illustrates this:

Снимок экрана 2023-06-27 в 16 26 55

Left doc (test.font.pdf) is filled with Adobe Acrobat and rigth (test.font.result.1.pdf) with HexaPDF.

gettalong commented 1 year ago

That is to be expected since there is no standard in how a text field is rendered. HexaPDF tries to mimick Adobe Reader as close as possible.

Here is the result from Okular: image

Here from Evince: image

Here from yet another PDF viewer: image

As you can see all viewer fill out the PDF form but the alignment etc. are not the same.

You should probably remove the background from those fields, then there wouldn't be doubled points etc. Alternatively, it is possible to create "comb" text fields which might be what you want here.

ekzobrain commented 1 year ago

In fact currenty when filling with Adobe Reader - it looks correct in contract with HexaPDF. Try this:

  1. Open test.font.result1.pdf (it was filled with HexaPDF) - text is moved
  2. Erase Date2 field and fill it again with the same value - text returns back to correct (awaited) position

So currently HexaPDf styles actually differ from current Adobe Reader (version 2023.003.20215) styles. Could this be fixed? Attached test.font.result.2.pdf was filled with Adobe Reader.

test font result 2.pdf

ekzobrain commented 1 year ago

Or maybe text position inside field box could be configurable some way? Like option to supply text offset from top-left / bottom-right /etc field bbox... Or options like CSS alignment - top/middle/bottom, left/center/right :)

As I see with my Adobe Reader - it aligns text by the top/right corner, while HehaPDF tries to center it both vertically and horizontally.

gettalong commented 1 year ago

As I said before the HexaPDF form field appearance generator already tries to mimick what Adobe Reader does as closely as possible. If you find a way to get it more accurate, you are welcome to provide a pull request. Alas, as far as I know there is no written reference for this by Adobe.

Once an application that can create form field appearance streams fills out a form, it also generates the associated appearance streams (HexaPDF supports this). Then all viewers render the form fields the same since they have been pre-rendered by the form-filling application. This is what happens with the file you filled out with Adobe Reader. And also with the one filled out by HexaPDF.

[Sidenote: PDF2.0 actually specifies that all applications have to generate such appearance streams.]

If you want to customize the rendering of appearance streams, create a subclass of the appearance generator (see https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/type/acro_form/appearance_generator.rb) and then set the configuration option 'acro_form.appearance_generator' (see https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/configuration.rb#L148-L152).

ekzobrain commented 1 year ago

Ok, we'll try custom appearance generator, thanks.