DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.76k stars 387 forks source link

How big of a skew angle can Leptonica detect? #644

Closed Andrija-Markovic closed 10 months ago

Andrija-Markovic commented 1 year ago

Good day,

I am looking for a non-AI solution that can detect skew angle of a scanned PDF text document. I have found some solutions that can detect a skew angle from -45 to 45 degrees, but they don't work reliably.

So, I am curious how big of a skew angle can Leptonica detect?

DanBloomberg commented 1 year ago

45 degrees is quite extreme -- not expected in most applications. However, anything can be done -- the larger the angle allowed, the more computation is required.

If you look at skew.c, you'll see DefaultSweepRange = 7.0 (degrees).

Skew detection sweeps over a range from - to +, then does a bifurcation at higher resolution. The default values can all be found at the top of skew.c. They are the defaults for pixDeskew().

If you don't want to use default values, use pixDeskewGeneral(), and try 45 for sweep range. To test, first rotate an image by, say, 40 degrees, and see if it deskews properly.

Andrija-Markovic commented 1 year ago

Hi @DanBloomberg! Thank you for your detailed response! I used lept4j to work with Leptonica and I did what you said--set the sweep range to 45 and rotated an image by 40 degrees. Leptonica deskewed it successfully--which is great!

However, I kept testing the limits of the library since you said that anything can be done. I set the sweep range to 180 and tried to deskew an image rotated by 190. The result of deskewing is an image that is rotated by 90 degrees (I don't mean 90 degrees from the original rotation which would be 280). So, I thought maybe I could call pixFindSkew() after I call pixDeskewGeneral() and check the skew. If skew was bigger than 0, I would call pixDeskewGeneral() again. Here's the code:

private static int defaultRedSweep = 4;
private static float sweepRange = 180;
private static float defaultSweepDelta = 1;
private static int defaultRedSearch = 2;
private static int defaultThresh = 130;
private static FloatBuffer defaultPAngle = null;
private static FloatBuffer defaultPConf = null;

private static BufferedImage deskew(BufferedImage bufferedImage) throws IOException {
  Pix pixOriginal = LeptUtils.convertImageToPix(bufferedImage);

  Pix pixDeSkewed = Leptonica1.pixDeskewGeneral(
          pixOriginal,
          defaultRedSweep,
          sweepRange,
          defaultSweepDelta,
          defaultRedSearch,
          defaultThresh,
          defaultPAngle,
          defaultPConf
  );

  int skewAngle = Leptonica1.pixFindSkew(pixDeSkewed, defaultPAngle, defaultPConf);

  if (skewAngle != 0) {
      System.out.println("Skew angle is " + skewAngle);

      pixDeSkewed = Leptonica1.pixDeskewGeneral(
              pixDeSkewed,
              defaultRedSweep,
              skewAngle + 15,
              defaultSweepDelta,
              defaultRedSearch,
              defaultThresh,
              defaultPAngle,
              defaultPConf
      );
  }

  return LeptUtils.convertPixToImage(pixDeSkewed);
}

Unfortunately, this did not do anything. The 190-degree-rotated image still gets rotated to 90 degrees.

Is it possible to handle this extreme of a rotation? Am I using values like redsweep, redsearch, etc. incorrectly? Any advice would be greatly appreciated!

DanBloomberg commented 1 year ago

By "anything can be done" , I meant that with sufficient work it's possible to solve for most situations. There will always be a few images that defy even a sensible algorithm. For example, an image with many long straight lines at 10 degrees will be able to fool the algorithm into finding a 10 degree skew, even if there is also a considerable amount of properly aligned text.

But I did not mean that you could do anything with a single function call!

If you want to go up to +- 90 degrees (and there's no reason to go further because this algorithm does not distinguish upside-down from rightside-up), then you can do a +- 50 degree range twice, first on the input image and then on the input image rotated by 90 degrees. Take the best result, based on the returned value of confidence.

To see what went wrong with range angles near 90 degrees, look at the algorithm, and in particular, the method for simulating large angles. pixFindSkewSweep() calls pixVShearCorner(), which calls pixVShear(), which warns you not to choose angles close to pi/2.

Andrija-Markovic commented 1 year ago

I see. Thank you for the explanation! All makes sense but I just did not follow which method call returns the value of confidence that you are suggesting I compare.

DanBloomberg commented 1 year ago

I believe all the skew-determining functions except for pixFindSkewSweep() return a confidence value. And nobody calls that function, anyway.

I should have pointed you at another function like pixFindSkewSweepAndSearchScorePivot() which uses shears to represent the effect of rotations, eventually calling pixVShear() with the warning about staying away from 90 degrees.