DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.72k stars 384 forks source link

pixOrientCorrect/pixOrientDetect issue with PBM #748

Open panterlo opened 1 month ago

panterlo commented 1 month ago

I have tested the pixOrientDetect/pixOrientCorrect through thousands of similar pages of annual reports with similar layouts. For some reason which I trying to understand I get a 90 cw rotation for pages that are still upright but ONLY when they are in PBM format and if I first convert the PBM files to a JPEG I don't get the rotation.

Attaching PBM in a ZIP file since Github doesn't allow me to upload the PBM. Leptonica-casuses-wrong-rotation.zip

DanBloomberg commented 1 month ago

What does the file format have to do with the operation on a raster image?

pixOrientDetect() works on a 1 bpp image. To write this as a jpeg, you need to first convert from 1 bpp to 8 bpp. Then after writing to file, you read the jpeg back into a pix, convert it from 8 bpp to 1 bpp, and then run the orientation detector. The processed image may differ slightly from the original, because of the lossy nature of the jpeg compression, but it is hard to understand how this could make any difference in the result of the orientation detector. But read on, because something very subtle is happening here.

First of all, I do get the bogus results that you get from pixOrientDetect() with your image. So it has nothing to do with the file format of the image, but as we will see, it does have something to do with the history of the image, which previously was compressed with jpeg.

This is very strange. Consider this program running on your image:

  pix1 = pixRead("/tmp/test.pbm");
  #if 0
    pix2 = pixScaleBySampling(pix1, 1.001, 1.001);
  #else
    pix2 = pixScaleBySampling(pix1, 0.999, 0.999);
  #endif
  pixOrientDetect(pix2, &upconf, &leftconf, 100, 1);
  lept_stderr("upconf = %f, leftconf = %f\n", upconf, leftconf);

If the scale factor is 1.0 (i.e., your input image), the bad results are obtained. However, if we scale the image by just one part in a thousand (either bigger or smaller), the results are sensible. The scaled and unscaled images look identical to the eye, but to the algorithm they are completely different.

Suppose we compare them with an XOR:

    pixCompareBinary(pix1, pix2, L_COMPARE_XOR, &fract, &pix3);
    lept_stderr("fract diff = %f\n", fract);
    pixWrite("/tmp/diff.png", pix3, IFF_PNG);

We see the outlines of the characters but with a very noisy result. The noise appears to be from the image previously having been compressed with jpeg, before being binarized, causing this "squirrel noise" near character edges.

And somehow all this noise near the edges is causing the algorithm to fail. Yet, the algorithm succeeds if the raster image is scaled by 1 part in a thousand, which doesn't get rid of the squirrel noise -- it just moves it around a little bit.

So, I can only tell you that the jpeg artifacts are causing the problem, and you can more easily fix the problem by doing a tiny rescaling operation. But I cannot visualize the underlying mechanism behind these results. It is a great puzzle!!! Do you have any insight into this?

DanBloomberg commented 1 month ago

Here's a much more efficient work-around. After you read the full resolution image, downscale using a scalefactor = 0.5, before testing for orientation. Suggest doing this with either pixReduceBinary2(pix, NULL) or ``pixReduceRankBinary2(pix, 1, NULL)```

panterlo commented 3 weeks ago

Thanks for sharing. I noticed that in some cases downscaling just by 1/1000 will not work but trying by both upscaling and downscaling works.

DanBloomberg commented 3 weeks ago

I did the experiments with downscaling by a tiny fraction to determine how sensitive the anomalous images are when modified.

As I indicated, this is by far the strangest situation I've seen with this algorithm. The use of special 2x reduction before testing for orientation will give you the correct answer as fast as possible, because those 2x reductions are blazingly fast, and running the orientation function on a half-sized image will be 4 times faster than at full res.