Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.77k stars 647 forks source link

Turn each page into a picture #1279

Open xujingsong1 opened 2 years ago

xujingsong1 commented 2 years ago

What are you working on?

I want to turn every page into a picture and get data in node. Does anyone know what to do ?

Additional Notes

No response

awied commented 2 years ago

You need Ghostscript for this and can't do this with pdf-lib. My PDF-to-image converter runs as an AWS Lambda. I've compiled Ghostscript into an AWS Layer (examples are https://github.com/rpidanny/gm-lambda-layer and https://github.com/shelfio/ghostscript-lambda-layer). In my lambda (running on Node14) I'm triggering a shell command to convert all PDF pages into png (one png per page) and store the images in a sub directory from where the png are further processed. Works fast and reliable for me. If you don't use a lambda the approach is no different, you'd just to link the Ghoscript lib differently. I didn't want to use yet another Ghostscript wrapper, hence my approach to use a simple shell command.

Example:

import {execSync} from 'child_process' const filePath = "folder/subfolder/file.pdf" // fullpath to the PDF file you want to convert

const inFilename = filePath.split('/').pop() const outFilename = inFilename.split('.').shift() + '-%d.png' // page 1 --> file-1.png, page 2 --> file-2.png let dir = filePath.split('/') dir.pop() // folder in which resulting png files will be created const filePathOut = path.join(dir.join('/'), outFilename) // full path for our images

// prepare and trigger shell command const cmd = gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dUseCropBox -o "${filePathOut}" -r400 -sPageList=1- -dGraphicsAlphaBits=4 "${filePath}" execSync(cmd)