4lex4 / scantailor-advanced

ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.
GNU General Public License v3.0
1.15k stars 128 forks source link

Page content selected improperly when running deskew or select content #203

Open rnmerchant opened 8 months ago

rnmerchant commented 8 months ago

Apologies if this has been covered or addressed previously. Newbee user. I've a simple PDF scan of a journal article; exported with acrobat to jpegs. They are a standard full page (~8x10") and very constant content with 3 columns of text and some images. When I run some of the functions - say, deskew or select content - the content is selected properly on the first three of 8 pages then improperly on the remainder: sides cut off (either side) or on one page and odd selection of part of one column. I tried setting 'select content' to manual but this doesn't change the issue.

Screenshot is attached.

Richard Screen Shot 12-15-23 at 10 23 AM

majkaz commented 8 months ago

What I am seeing is the result of incorrect "Split pages". Best way to avoid it is to set it manually for all pages (Manual + Apply cut).

You cannot simply pass over the first stages. Scantailor won't complain but will work differently than you expect. It will run these steps automatically and often quite wrong - especially if the text is in columns or if there are "column-like" pictures or tables.