galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Reading / merging from Google Cloud Storage #342

Open brianmhunt opened 5 years ago

brianmhunt commented 5 years ago

Google Cloud Storage provides its own Node.js library that allows creating read streams and write streams.

I would like to read from Google Cloud Storage a number of PDFs and append them to a new PDF that's saved back to Google Cloud Storage.

At the moment I read and convert the files into Buffer with (using Google's Firebase):

const getRawBody = require('raw-body')
const admin = require('firebase-admin')

function asBuffer (filename) {
  const bucket = admin.storage().bucket()
  const file = bucket.file(filename)
  return getRawBody(file.createReadStream())
}

This works fine.

However, when I try to write to a Cloud file, it runs out of memory. Here's one of the many variants I've tried:

function writePdf (buffer, writeStream) {
  const pdfWriter = hummus.createWriter(
    new hummus.PDFStreamForResponse(writeStream))
  }
  const rstream = new hummus.PDFRStreamForBuffer(buffer)
  pdfWriter.appendPDFPagesFromPDF(rstream) // 🚨 Memory error / timeout.
}

const file = bucket.file('new.pdf')
const fileOptions = {}
const writeStream = file.createWriteStream(fileOptions)
// where `buffer` is a `Buffer` containing a PDF from e.g. `asBuffer`:
writePdf(buffer, writeStream)

Nobody seems to have encountered this issue on StackOverflow/google.

This workflow seems to work fine when using a memory stream (e.g. memorystream) as the writeStream or a file stream.

Is there something I may be missing about how these tie together? Should the above work with a generic WritableStream or are there presumptions that have to be worked around?

Many thanks!

galkahana commented 5 years ago

WriteStream should work fine with PDFStreamForResponse. I would place my suspicions first in reading, rather than writing. it might be that the buffer reading is erring. perhaps the body reading is wrong? try using the rstream to repeatedly read and write to a file...then see if it creates a valid PDF file.

brianmhunt commented 5 years ago

Thanks @galkahana . I was pretty confident that the Buffer was being read, as I output the content to the console, but in any case I've decided to redesign around a browser-only implementation.

(Aside, we're looking for something relatively simple in the sense of page manipulation, ala pdfassembler)

I no longer have the code to test this, so please feel free to close this issue on the basis that there's nobody to test if it passes/fails. :) I'll repost/start a new issue if we come back to this.

Thank you @galkahana !

ggaabe commented 5 years ago

Hi, I recently implemented downloading 5 different PDF files in a firebase cloud function and merging them together.

  1. Create a local temp file path for each file.
  2. Download each file to their respective temp destination.
  3. Create a temporary local filepath for the final PDF and then create a writer:
    var finalPath = path.join(os.tmpdir(), "final.pdf")
    var pdfWriter = hummus.createWriter(finalPath);
  4. Merge all the pdf files together: pdfWriter.appendPDFPagesFromPDF(pdf1TempPath) pdfWriter.appendPDFPagesFromPDF(pdf2TempPath) pdfWriter.appendPDFPagesFromPDF(pdf3TempPath) pdfWriter.end()
  5. Upload the merged PDF by referencing the temporary local filepath destination of the final PDF as the source.