Open xujingsong1 opened 2 years ago
You need Ghostscript for this and can't do this with pdf-lib. My PDF-to-image converter runs as an AWS Lambda. I've compiled Ghostscript into an AWS Layer (examples are https://github.com/rpidanny/gm-lambda-layer and https://github.com/shelfio/ghostscript-lambda-layer). In my lambda (running on Node14) I'm triggering a shell command to convert all PDF pages into png (one png per page) and store the images in a sub directory from where the png are further processed. Works fast and reliable for me. If you don't use a lambda the approach is no different, you'd just to link the Ghoscript lib differently. I didn't want to use yet another Ghostscript wrapper, hence my approach to use a simple shell command.
Example:
import {execSync} from 'child_process'
const filePath = "folder/subfolder/file.pdf" // fullpath to the PDF file you want to convert
const inFilename = filePath.split('/').pop()
const outFilename = inFilename.split('.').shift() + '-%d.png' // page 1 --> file-1.png, page 2 --> file-2.png
let dir = filePath.split('/')
dir.pop() // folder in which resulting png files will be created
const filePathOut = path.join(dir.join('/'), outFilename) // full path for our images
// prepare and trigger shell command
const cmd = gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dUseCropBox -o "${filePathOut}" -r400 -sPageList=1- -dGraphicsAlphaBits=4 "${filePath}"
execSync(cmd)
What are you working on?
I want to turn every page into a picture and get data in node. Does anyone know what to do ?
Additional Notes
No response