Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.6k stars 635 forks source link

Blank page problem, all pdf pages are merged into one pdf in blank format. #1579

Closed Ferhatduran55 closed 5 months ago

Ferhatduran55 commented 7 months ago

What were you trying to do?

I investigated why I was seeing a blank white page. I think I tried all possible things. At first I had a problem with buffers and async functions, I fixed it. I was trying to enable the user to download the pdf links explained in a certain section on the page as a single pdf.

How did you attempt to do it?

I reviewed the relevant parts of the pdf-lib documentation. I learned how to use preloadjs and downloaded resources together, but at this stage I think it was in conflict with pdf-lib.

What actually happened?

I haven't received any warnings or errors. Frankly, I don't even understand what's going on.

image image image

What did you expect to happen?

What should happen is that the pages of many PDFs are combined and downloaded into a single PDF.

How can we reproduce the issue?

    const handleComplete = async event => {
        console.log(event);
        Loader('title').text("All files downloaded!");

        const items = preload.getItems();

        setTimeout(() => {
            var output = mergeAndDownload(items);
            Loader().fadeOut(200);
        }, 1000);
    }

    const mergeAndDownload = async (items) => {
        try {
            const pdfDoc = await PDFDocument.create();

            await Promise.all(
                items.map(async (item) => {
                    try {
                        const pdfBytes = new Uint8Array(item.result.split('').map(c => c.charCodeAt(0))).buffer;

                        const externalDoc = await PDFDocument.load(pdfBytes);
                        const pages = await pdfDoc.copyPages(externalDoc, externalDoc.getPageIndices());
                        pages.forEach((page) => pdfDoc.addPage(page));
                        console.info(externalDoc);
                    } catch (error) {
                        console.error(`Error processing PDF: ${error.message}`);
                    }
                })
            );

            const mergedPdfBytes = await pdfDoc.save();
            console.log({ pdfDoc, mergedPdfBytes });
            setTimeout(() => {
                downloadPDF(mergedPdfBytes, "merged.pdf");
            }, 1000);
        } catch (error) {
            console.error(`Error creating merged PDF: ${error.message}`);
        }
    }

    const downloadPDF = (pdfBytes, fileName) => {
        try {
            const blob = new Blob([pdfBytes], { type: "application/pdf" });

            const link = document.createElement("a");
            link.href = URL.createObjectURL(blob);
            link.download = fileName;

            document.body.appendChild(link);
            link.click();

            document.body.removeChild(link);
        } catch (error) {
            console.error(`Error downloading PDF: ${error.message}`);
        }
    }
...
    function getAllOfFiles() {
        const list = []
        $(`PRIV`).each((i, e) => {
            const onclickValue = $(e).attr("onclick")

            if (onclickValue) {
                const match = onclickValue.match(/PRIV\('([^']+)'/)

                if (match) {
                    const pdfLink = match[1]
                    list.push({name:e.text.trim(), file:pdfLink})
                }
            }
        });
        manifest = list
        loadAll()
        Loader().show()
    }

Version

1.17.1

What environment are you running pdf-lib in?

Browser

Checklist

Additional Notes

These codes are part of the user script that allows the pdf links of a particular page to be downloaded as a single pdf. The user script contains active jquery, pdf-lib, createjs and progressbar libraries.

amaliaywalter commented 7 months ago

Hello, from Argentina! I'm experimenting a similar issue while trying to merge pdf files into a single file by copying pages. But the issue seems to take place only when input files are PDF version 1.4 (Acrobat 5.x). It works fine on PDF version 1.7. I'm trying to find a solution, but still can't.

Ferhatduran55 commented 6 months ago

Thank you for the information, obviously a few days ago I somehow managed to fix this problem, but I encountered printing errors on some files. For example, a scenario such as printing only 2 of 30 pages.

The solution I found is as follows: The user script downloads the pdfs on the page sequentially using preloadjs, and then they come to the relevant function to be downloaded through an array. Here, I was using the rawResult or result values ​​of the files (items) previously loaded with preloadjs, it did not work, in the last case, I used preloadjs to find the solution using fetch. We can directly retrieve the files loaded into the cache. Since I had little time, I did not take into account issues such as optimization, efficiency, data usage, after all, I had to use it as soon as possible. Large files that cannot be cached are downloaded a second time.

There is definitely a solution to this, but for now I'm getting what I need :).

   const mergeAndDownload = async items => {
        try {
            const pdfDoc = await PDFDocument.create();
            for (const cached of items) {
                try {
                    const pdfBytes = await fetch(cached.item.src).then(res => res.arrayBuffer());

                    const externalDoc = await PDFDocument.load(pdfBytes);
                    const pages = await pdfDoc.copyPages(externalDoc, externalDoc.getPageIndices());
                    pages.forEach((page) => pdfDoc.addPage(page));
                } catch (error) {
                    console.error(`Error processing PDF: ${error.message}`);
                }
            }

            const mergedPdfBytes = await pdfDoc.save();
            setTimeout(() => {
                let filename = prompt("Enter merged filename:") ?? "merged"
                downloadPDF(mergedPdfBytes, `${filename}.pdf`)
            }, 1000);
        } catch (error) {
            console.error(`Error creating merged PDF: ${error.message}`);
        }
    }

    const downloadPDF = (pdfBytes, fileName) => {
        try {
            const blob = new Blob([pdfBytes], { type: "application/pdf" });

            const link = document.createElement("a");
            link.href = URL.createObjectURL(blob);
            link.download = fileName;

            document.body.appendChild(link);
            link.click();

            document.body.removeChild(link);
        } catch (error) {
            console.error(`Error downloading PDF: ${error.message}`);
        }
    };

    function handleComplete(event) {
        const items = queue.getItems();

        setTimeout(() => {
            var output = mergeAndDownload(items);
            Loader().fadeOut(200);
        }, 1000);
    }

If we adjust the cache limit, we can ensure that it uses only the files in the cache with the fetch cache property.

const pdfBytes = await fetch(cached.item.src,  { cache: "force-cache" }).then(res => res.arrayBuffer());
amaliaywalter commented 6 months ago

Thanks! I undestand the problem you were dealing with was caused by large input files...