gkovacs / pdfocr

Adds text to PDF files using the cuneiform OCR software
MIT License
325 stars 49 forks source link

Parallel execution? #24

Open shuhaowu opened 9 years ago

shuhaowu commented 9 years ago

This seems like a relatively easy thing to parallelize, as currently it only works in serial.

I am envisioning (with a queue):

  1. Parallel PDF => image extraction
  2. Parallel OCR per image
  3. Parallel (merge-sort like) merging of PDF

It shouldn't take too much work from what I see in code, but could be nice. I can give this a try if I have time.