codeforamerica / pdfparser

A command line utility written in Java for working with PDF forms
11 stars 6 forks source link

concat_pdfs puts all form fields on first page #28

Open bengolder opened 8 years ago

bengolder commented 8 years ago

When combining multiple forms, the concat_pdfs command creates a pdf in which all of the form fields from all the pdfs end up on the first page.

This pdf (created before switching to OpenPDF) pdfscan

Now looks like this when created by HEAD:

screen shot 2016-09-25 at 9 30 58 am
dtow1 commented 7 years ago

We've traced this down to a suspicious method in OpenPDF that maybe shouldn't exist:

PdfCopyFormsImp.mergeFields() overrides PdfCopyFieldsImp.mergeFields() in a way that seems to skip important steps, in particular PdfCopyFieldsImp.mergeFields() includes a step that calculates pages offsets for fields while merging and PdfCopyFormsImp.mergeFields() does not.

so the solution might be to delete PdfCopyFormsImp.mergeFields() in OpenPDF.

bengolder commented 4 years ago

Tried swapping out OpenPDF for PDFBox, using the PDFMergerUtility class, and it resulted in the same bug. So the problem appears to be consistent across OpenPDF, PDFBox, but not iText.

Tried using different pdfs with interactive forms in them, and this confirmed that the bug is consistent and independent of the pdf.