4lex4 / scantailor-advanced

ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.
GNU General Public License v3.0
1.18k stars 129 forks source link

Automatic creation of searchable PDFs #88

Open atrink opened 4 years ago

atrink commented 4 years ago

First of all I would like to congratulate on this masterpiece of software! I have been a professional developer for embedded software for decades and have seen many programs with low usability. But scantailor-advance really has a very sophisticated GUI and I like that.

I also bought a CZUR scanner and scanned many books with it. The delivered software is the opposite of STA in terms of usability. For example, you can rotate the scanned pages with it, but only in 1-degree steps and only for one page at a time. In STA, all pages are rotated perfectly aligned in a few seconds by pressing a button. This saves a lot of time!

But now to my stupid question: Why isn't there one last step to create (searchable) PDFs? Why does STA stop after generating TIFs?

Currently I use a small script that does the following:

!/bin/bash

ffn="$(pwd)" fn="$(basename $ffn)" cd out ls -v *.tif > filelist tesseract -l deu+eng filelist "../$fn" pdf

But it is cumbersome to open a terminal, go to the directory and start the script. It would be much more comfortable if STA would provide a GUI for this.

homocomputeris commented 4 years ago

https://github.com/jbarlow83/OCRmyPDF is a great tool: img2pdf my-images*.jpg | ocrmypdf - myfile.pdf