DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.81k stars 393 forks source link

Logic error in pixcmapIsValid() #734

Closed blehel closed 9 months ago

blehel commented 9 months ago

I tried to run the jbcorrelation sample prog, which failed with this message: Error in pixcmapIsValid: (max index = 3) >= (num colors = 3)

This can be traced back to the debug call to jbDataRender: https://github.com/DanBloomberg/leptonica/blob/e09c1f283aa5a896facc297b617775373519d450/prog/jbcorrelation.c#L191-L192

What this does is that it sets up the pix as 2 bpp so that it can use a 3rd color to draw the outlines. So we have a colormap with color index between 0 and 3, and 3 colors (black, white & red) instead of the maximum possible of 4. However this makes a check in pixcmapIsValid fail here: https://github.com/DanBloomberg/leptonica/blob/e09c1f283aa5a896facc297b617775373519d450/src/colormap.c#L370-L380

Actually the comment describes the intent correctly, but the code is wrong. I did the following change to make it work correctly:

diff --git a/src/colormap.c b/src/colormap.c
index 7d5b567..ac4350a 100644
--- a/src/colormap.c
+++ b/src/colormap.c
@@ -373,9 +373,9 @@ l_int32  d, depth, nalloc, maxindex, maxcolors;
          * max indexing into the colormap array. */
     if (pix) {
         pixGetMaxColorIndex(pix, &maxindex);
-        if (maxindex >= cmap->n) {
-            L_ERROR("(max index = %d) >= (num colors = %d)\n", __func__,
-                    maxindex, cmap->n);
+        if ((cmap->n - 1) > maxindex) {
+            L_ERROR("(num colors - 1 = %d) > (max index = %d)\n", __func__,
+                    cmap->n - 1, maxindex);
             return 1;
         }
     }
DanBloomberg commented 9 months ago

The comment was a bit vague, but the code is correct. A better comment would be:

/* Where the colormap or the pix may have been corrupted, and
         * in particular when reading or writing image files, it should
         * be verified that the largest colormap index value in the image
         * is less than the number of entries in the colormap array.  */

and a better error message would be

            L_ERROR("(max index in image = %d) >= "
                    "(number entries in colormap = %d)\n", __func__,
                    maxindex, cmap->n);

The actual problem is that jbDataRender() is making an invalid pix when used in debug mode. Here's the histogram of pixel values that it generates for the input image feyn.tif:

  [0] = 7132936.000000
  [1] = 887050.000000
  [2] = 321548.000000
  [3] = 866.000000

I don't know where the 866 pixels with colormap index = 3 come from, but they shouldn't be there, because the colormap has only 3 entries!

blehel commented 9 months ago

Thanks for clarifying, I misunderstood what the maxindex variable is about.

I added code to print out the histogram to a few places, and the 4th color appears after the call to pixRasterop: https://github.com/DanBloomberg/leptonica/blob/e09c1f283aa5a896facc297b617775373519d450/src/jbclass.c#L2145 It starts appearing not right away, but after some of the iterations. Maybe it happens when the character to be drawn overlaps one of the previously drawn red boxes?

DanBloomberg commented 9 months ago

That is exactly the problem. The character images have colormap indices 0 and 1, so just the low order bit of the 2 bit pixel. But the red boxes have index 2, which is 10, so doing a bit OR between red and a black pixel in the image gives a value 11, which is the spurious index 3.

This bug has always been in the library and the debug programs jbcorrelation and jbrankhaus, but the failure only was apparent after I started checking for valid colormaps.

I have modified the calculation to avoid this issue, and will push it RSN.

blehel commented 9 months ago

I've done a build with the latest master, everything seems to be OK now. The issue can be closed.

DanBloomberg commented 9 months ago

Thank you for reporting and investigating the issue, Lehel.

Dan