Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.77k stars 647 forks source link

Copy Pages results form fields disappearing #1205

Open danielrubinov97 opened 2 years ago

danielrubinov97 commented 2 years ago

What were you trying to do?

I was trying to get fields from a multiple-page pdf file I generated with the copyPage paradigm from the documentation. This paradigm is very similar to the example: ` async function copyPages() { const url1 = 'https://pdf-lib.js.org/assets/with_update_sections.pdf' const url2 = 'https://pdf-lib.js.org/assets/with_large_page_count.pdf'

const firstDonorPdfBytes = await fetch(url1).then(res => res.arrayBuffer()) const secondDonorPdfBytes = await fetch(url2).then(res => res.arrayBuffer())

const firstDonorPdfDoc = await PDFDocument.load(firstDonorPdfBytes) const secondDonorPdfDoc = await PDFDocument.load(secondDonorPdfBytes)

const pdfDoc = await PDFDocument.create();

const [firstDonorPage] = await pdfDoc.copyPages(firstDonorPdfDoc, [0]) const [secondDonorPage] = await pdfDoc.copyPages(secondDonorPdfDoc, [742])

pdfDoc.addPage(firstDonorPage) pdfDoc.insertPage(0, secondDonorPage)

const pdfBytes = await pdfDoc.save() } `

So I have 1 pdf I fill out based on some parameters. If there are many parameters, I generate another instance of that PDF and join them together based on the above paradigm.

Later in the life cycle of the joined PDF, I try to write to it again by trying to modify the text within the fields.

How did you attempt to do it?

I attempted to get the form fields from a multiple-page PDF using a few methods, but the main was to just load the multiple-page PDF, get the form, then get the fields.

After trying a few different tactics I settled on this method to diagnose the problem: let pdfDoc = await PDFDocument.load(pdfBytes) const form1 = pdfDoc.getForm() const fields1 = form1.getFields() console.log("FIELDS 1", fields1) let container = []; for(let i = 0; i < pdfDoc.getPages().length; i++){ const subDocument = await PDFDocument.create(); // copy the page at current index const [copiedPage] = await subDocument.copyPages(pdfDoc, [i]) subDocument.addPage(copiedPage); const pdfBytesTemp = await subDocument.save() let tempDoc = await PDFDocument.load(pdfBytesTemp) let form3 = tempDoc.getForm() let fields3 = form3.getFields() console.log("FIELDS3", fields3) container.push(pdfBytesTemp) } console.log("CONTAINER LENGTH", container.length)

This is where pdfBytes is a variable containing the buffer of the PDF pulled from a CDN using Axios.

With this system, I tried to read the fields before splitting up the PDF and then after splitting up a PDF. fields1 didn't log anything and neither did fields3. As a result, I attempted a single-page PDF which does yield results for fields1 to see what would happen.

What actually happened?

What actually happened was that fields3 came back as an empty array, as a result, I concluded copyPages possibly strips out the fields from the original PDF during the copy process.

By extension, this could be a reason why my multiple-page PDFs don't show up with fields as I concatenated the PDFs using the copyPages methodology.

What did you expect to happen?

I expected to see a list of fields after the copyPages method.

How can we reproduce the issue?

The best way to reproduce this is to take the code from the "How did you attempt to do it method?". With this code, one would pass a PDF with form fields generated with adobe acrobat. The console logs are properly placed to notice that the form fields disappear after the copyPages method.

Attached is an example of such a 1-page form that I would concatenate. AOB.pdf

Version

1.16.0

What environment are you running pdf-lib in?

Node

Checklist

Additional Notes

Thank you for your time.

danielrubinov97 commented 2 years ago

It could be that the main function of the copyPages method is to just copy the PDF and its metadata. So a subsequent question would be, how does one transfer the form as well, during the copyPages process?

happytogether commented 2 years ago

I have the same question. After copying the pages from different pdfs and saving/merging them into one single pdf, once you run getForm() and then getFields, you will get an empty array. You can use this fiddle to download the pdf merged by this awesome pdf-lib repo. Then run

const form = pdfDoc.getForm(); 
const fields = form.getFields();  
fields.forEach(field => {
  const type = field.constructor.name. 
  const name = field.getName(). 
  console.log(type, name). 
})

to receive an empty array.

glimmbo commented 2 years ago

I don't think it's supported currently, the copyPage method documentation says:

NOTE: This method won't copy all information over to the new document (acroforms, outlines, etc...).

So I don't think copyPages does either. I have a similar situation where I need the forms from multiple PDFs to be merged into one. I think the process will have to be done manually, something like:

(With 2 PDFs)

  1. load both
  2. get all the fields from both, and the widgets (where the fields are rendered)
  3. ensure that there aren't fields that have are named the same across the two forms
  4. copy pages
  5. add the combined fields into the new, merged pdf field dictionary
  6. use the widgets information to re-place all the fields on the appropriate pages

I'm still working on the code solution to this, but I'm fairly sure this is what you'd have to do to carry over form data with copyPages

EDIT: Relates to this comment by @Hopding

glimmbo commented 2 years ago

@Hopding am on the right train of thought with merging form content?

arms1997 commented 2 weeks ago

hey @glimmbo any luck with the implementation you suggested? Facing a similar issue right now