ElectricRCAircraftGuy / PDF2SearchablePDF

`pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!
MIT License
127 stars 14 forks source link

Make pdftoppm use multiple cpu cores #1

Open ElectricRCAircraftGuy opened 5 years ago

ElectricRCAircraftGuy commented 5 years ago

pdftoppm is a single-threaded Program but can be made to utilize multiple cores and drastically speed up total processing time as follows: See how many cores you have. See how many pages are in the PDF. Split the PDF into chunks. Have one process of pdftoppm per core, each processing a different chunk of the PDF. Once all processes are complete, continue on.

ElectricRCAircraftGuy commented 1 year ago

Note to self: nproc on Linux shows the number of cores. See $(nproc) used in my answer here: https://askubuntu.com/a/1479490/327339

I could probably just use xargs for this, like I do there to unzip files in parallel:

# Unzip all files
time find . -maxdepth 1 -type f -iname "*.zip" -print0 | xargs -0 -I{} -n 1 -P $(nproc) unar -f {}

Parallelize all parts of the program where able.