DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.8k stars 393 forks source link

pixOrientDetect with .bmp file #733

Open saikumarkavali1 opened 9 months ago

saikumarkavali1 commented 9 months ago

I'm using below Leptonica Api [1.81.1] for Orientation detect and rotate the file according to the text.

pixOrientDetect(new HandleRef(pix, pixconv), out pupconf, out pleftconf, 0, 0);

For a particular BMP file shared below, it is returning wrong outputs so that Rotation is not expected one.

Input_File.zip

saikumarkavali1 commented 9 months ago

To get the pixconv value I have used below Api and 130 is the random threshold value. var pixconv = Tesseract.Interop.LeptonicaApi.Native.pixConvertTo1(pix.Handle, 130);

With threshold 255, my issue was resolved but I have doubt about threshold value. How to know the threshold value of any file.

DanBloomberg commented 9 months ago

This is working as expected. There is a note in pixUpDownDetect() that the image should have a resolution between 150 and 300 ppi. This note is easy to miss and should have been put in a more obvious place, and I will do that.

Your image was made at a high resolution of 600 ppi., so I scaled it down with a scalefactor of 0.35, binarized it with a threshold of 128 (a reasonable value to use for a clean scan such as yours), and ran pixOrientDetect() on it:

    pix1 = pixRead("/tmp/input.bmp");   // resolution 600 ppi
    pix2 = pixScale(pix1, 0.35, 0.35);   // reduces resolution to about 210 ppi
    pix3 = pixConvertTo1(pix2, 128);
    pixOrientDetect(pix3, &upconf, &leftconf, 0, 0);
    lept_stderr("upconf = %f, leftconf = %f\n", upconf, leftconf);

with the result:

   upconf = 14.350754, leftconf = 1.302541

which says there is a very high confidence that it is rightside up.

saikumarkavali1 commented 9 months ago

Thank you DanBloomberg for sorting this out.

Your inputs helped me to solve my issue. Any alternate approach for finding the threshold value for a file rather than some reasonable value like 128.

DanBloomberg commented 9 months ago

There are many functions that adapt the local threshold based on a measurement of the background value. This is done by first normalizing the background to a constant value. For the full set of normalizing functions, see adaptmap.c.

Adaptive binarization is done in two steps: (1) Background normalization by some method (2) Global thresholding with a value appropriate to the normalization.

There are several high level functions in leptonica for doing adaptive binarization on grayscale and color images, such as:

   pixAdaptThresholdToBinarypix, NULL, 1.0)   (in grayquant.c)
   pixConvertTo1Adaptive(pix)                           (in pixconv.c)
   pixCleanImage(pix, 1, 0, 1, 0)                        (in pageseg.c)
saikumarkavali1 commented 9 months ago

Thank you @DanBloomberg for the information.

I have a query on below Api's

 pixOrientDetect(pix, &upconf, &leftconf, 0, 0);
 pixConvertTo1(pix, 128);
  1. I have applied a Red color stamp and used above API's for rotation. Result: Stamp color changed to Blue.
  2. Again I applied another Red color stamp and used same API's. Result: Stamp1 color changed to Red and Stamp2 changed to Blue.

File: Input_File.zip

DanBloomberg commented 9 months ago

I don't know what you are doing. Before you can use the orientation detector, you must convert to 1 bpp with resolution between 150 and 300, as I have previously described. This gets rid of the color.

So if you do this:

    pix1 = pixRead("input.bmp");   // 600 ppi, 8 bpp, 256 colors
    pix2 = pixScale(pix1, 0.35, 0.35);    // down to about 210 ppi, 32 bpp rgb  (colormap is removed)
    pix3 = pixConvertTo1(pix2, 128);     // 1 bpp
    pixOrientDetect(pix3, &upconf, &leftconf, 0, 0);   // only works on 1 bpp images
    lept_stderr("upconf = %f, leftconf = %f\n", upconf, leftconf);

the result is:

upconf = 14.350754, leftconf = 1.302541

which says that the orientation is correct as it is. This is finding a global angle for rotation; it is essentially ignoring the stamp, which the algorithm sees as a blob of pixels without any small text orientation.