Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.98k stars 673 forks source link

PDF Pages Appear Blank After Processing with pdf-lib #1639

Open erezz33 opened 5 months ago

erezz33 commented 5 months ago

What were you trying to do?

I was trying to read an existing PDF file, process it using the pdf-lib package, and save it as a new PDF file. The goal was to retain all the content and structure of the original PDF in the new file. This issue occurs with a specific PDF file only, it contains encryption, but not the kind that when I open the file, I am asked to insert a password (in the file settings i see Security : Password Encrypted).

How did you attempt to do it?

I attempted to achieve this using the following code snippet: `import fs from 'fs'; import { PDFDocument } from 'pdf-lib';

const src = '/src/src.pdf'; const dest = '/src/output.pdf';

async function a() { fs.readFile(src, async (_err, existingPdfBytes) => { // Load a PDFDocument from the existing PDF bytes const pdfDoc = await PDFDocument.load(existingPdfBytes); // Serialize the PDFDocument to bytes (a Uint8Array) const pdfBytes = await pdfDoc.save(); // Write the bytes to a file fs.writeFile(dest, pdfBytes, (err) => { if (err) { console.error('Error writing PDF file:', err); } else { console.log('PDF file created successfully:'); } }); }); }

a();`

What actually happened?

The resulting PDF file has the correct number of pages, but all the pages are blank. The content of the original PDF is not preserved in the new file.

What did you expect to happen?

I expected the new PDF file to be an exact replica of the original PDF, with all the same content and structure. Please note that this code snippet is not my real use case, but it is the basic code that helped me identify where the problem is.

How can we reproduce the issue?

Since I cannot share the specific PDF file I'm working with due to its sensitive content, I can provide the following details to help diagnose the issue:

The issue occurs with a specific PDF file, and I don't know what is unique about it. The PDF contains multiple pages with text and images. The PDF might contain complex elements such as embedded fonts, vector graphics, or non-standard annotations. To reproduce the issue, consider the following steps:

Use the provided code snippet. Test with various PDFs, especially those with complex elements like embedded fonts, vector graphics, or non-standard annotations. If possible, create or find a PDF that includes such elements and observe if the issue can be replicated. If sharing the specific PDF is necessary for diagnosing the issue, please let me know, and I will arrange to share it privately.

Version

v20.9.0

What environment are you running pdf-lib in?

Node

Checklist

Additional Notes

No response

KevenArbache commented 2 months ago

I have the same problem, is there any solution for this?

MiaoQinn commented 1 week ago

I suspect that your original PDF has a lower-level permission-based encryption that requires an admin password to lift the restriction for editing. Unless you know the owner's password I am not sure if you can really 'Edit' the said pdf. You could try to 'print' the pdf(using Acrobat Pro) to a new blank pdf which will remove most restriction, or convert it to a word and then back to a pdf to see if the encryption is still there.