4lex4 / scantailor-advanced

ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.
GNU General Public License v3.0
1.15k stars 128 forks source link

Declare page dimensions #185

Open corei8 opened 2 years ago

corei8 commented 2 years ago

Can there be a feature which declare the dimensions of the output PDF? Right now I can view the final dimensions as I play with the margins, but having a pre-defined output would be more convenient.

artmandc commented 2 years ago

Specifying the pixel dimensions of the output image files (e,g, 2550x3300 for letter size, 300 dpi) would end the workaround — which I struggle with — of adding a blank image of the desired dimension to the processing queue. Thanks for Scantailor Advanced!

Piolie commented 2 years ago

It can be done. In step 4 Select Content you can specify a Page box of Manual size. If you apply the same page box to all pages then the document size is fixed by your page box size.

However, it is very easy to screw up the size when changing the margins. If you set margins which would fall outside the page box, then the document size is overridden. This has happened to me frequently when adjusting the margins with Shift+LMB+drag. Also, the page box can't leave any part of an existing content box out (see #123 for some details). The workaround is to first set the page box for the whole document and then do the content recognition.

This step needs some polish. But the alternative is to do several iterations of ordering pages by height/width and manually adjust the margins of the tallest/widest until the whole document is within spec—doable, but no bueno.

artmandc commented 2 years ago

I found it nearly impossible to enter my desired dimension with flaky actions like typing "11" but getting "51." Also, what should be full page of text ends up reduced to about 1/4 size centered on a mostly-empty page, despite source and output tif files being same resolution.

My project, if it matters, involves starting with scanned microfilm images (some 300, some 600 dpi), each with two pages of text. The end product is PDFs created and OCR'ed with Tesseract. I really like the quality of the grayscale-to-black+white conversion in ScanTailor, but the rest of the process is driving me mad. Also, the guides make alignment easier from page-to-page.

In the meantime, though, I may have to find a different solution.

Piolie commented 2 years ago

I found it nearly impossible to enter my desired dimension with flaky actions like typing "11" but getting "51."

This happens because the desired page box border would intersect the selected content box border, and so ST resizes the page box. I think the software should not impose limits on the page box, but for the moment you can avoid this strange behavior by deleting the content box before setting changing the page box.

(some 300, some 600 dpi)

There are also problems if you use input images with different DPIs (see #173) in the same project. The only workaround I found is adhering to a single DPI within one project.