Closed fusefib closed 2 months ago
Hi, @fusefib .
In Linux, the one you described cannot be played either on 2x or 4x.
I'll try to prepare a better picture for testing. In any case, it consistently does this for high resolution pictures that are mostly white and have almost no black, when scaling up. However, pages with enough text always turn out fine. Here is the result for me: testpic-scaled.zip
2x: 4x:
@zvezdochiot Reproduced with Otsu, Dots8, EdgePlus(somehow), BlurDiv, EdgeDiv, Robust Not reproducible in 0.2023.10.24 Reproducible in 0.2023.10.30 and so on Looks like some bug introduced in 0.2023.10.30 (for all sizes) and partially fixed in 0.2023.11.19 (for size 1.0). Also if you change image size manually by 50% and then upscale 2x in ScanTailor - everything is ok, but when you upscale 4x you will got the same result like original image with scale 2x (thin black lines). So its probably something related to image size.
Also reproducible on 32-bit linux. So maybe should look for long (32bit on windows and 32bit linux, 64bit on 64bit linux) variables
I suggest to check https://github.com/ImageProcessing-ElectronicPublications/scantailor-experimental/commit/aafb67f95a0685535c999ede9fb5f8ed5112beed https://github.com/ImageProcessing-ElectronicPublications/scantailor-experimental/commit/587683bdda23650fa0d237df6ebe85db434f2a13 and https://github.com/ImageProcessing-ElectronicPublications/scantailor-experimental/compare/0.2023.11.10...0.2023.11.19#diff-4ddd4b67d244ac7c7864d278c0179f232b36742520daee44b20ec5b80e4f6324R870 (and maybe other code) and replace long with int64_t/uint64_t
@plzombie , this could be (?!) an overflow in: https://github.com/ImageProcessing-ElectronicPublications/scantailor-experimental/blob/11d943af4b93c44e6b89bf47f90388025009e30f/src/imageproc/Binarize.cpp#L197 but the type is specified independent of the bit depth (!).
PS: All the thresholds you listed use binarizeBiModalValue
.
Tb = (ib > 0) ? (Tb / (double) ib) : 0.0;
Tw = (iw > 0) ? (Tw / (double) iw) : (double) histsize;
for (k = 0; k < histsize; k++)
{
im = histogram[k];
if (k < threshold)
{
Tb += ((double) im * (double) k);
ib += im;
}
else
{
Tw += ((double) im * (double) k);
iw += im;
}
}
unsigned long int im, histogram[histsize] = {0};
double iw = 0.0, ib = 0.0;
@fusefib Could you try this build? staging.zip scantailor-experimental-0.2024.05.05-X86-64-Qt6-install.zip
Confirmed it works properly now with both zips. Thanks to both of you for the fix!
Description: I am experiencing difficulties with the normal binarizing feature in ScanTailor-Experimental. When attempting to process an image containing predominantly white background with little black text, the resulting binarized image displays faint or erased text when scaling up the resolution to
2x
or higher. Adjusting theCurve/Sqr color
settings from0.5/0
to1/1
provides some improvement but does not fully restore the text clarity. Changing the Threshold method fromOtsu
does, but keepingOtsu
is preferred. Setting the thickness to the max doesn't help.Expected Behavior: The binarization process should maintain the clarity and legibility of the black text, even when scaling up the resolution.
Actual Behavior: The black text becomes faint or disappears entirely in the binarized image when the resolution is increased.
Environment:
Steps to reproduce:
Output
.Resolution Enhancement
box and click2x
.Notes: I've just started using this variant of ScanTailor (awesome work!) and I haven't done extensive testing to determine if this is a regression and, if so, when it occurred. Would be happy to test if needed.