DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.74k stars 387 forks source link

introducing pixScaleBySamplingTopLeft, because sometimes we need (to visualize) pixel expansion from top/left rather than centre/centre. #677

Closed GerHobbelt closed 1 year ago

GerHobbelt commented 1 year ago

See the 'Dancing Troupe' comparative screenshots below for a use case.

Context / Background info

OK, same custom leptonica + tesseract + others rig as in the previous pullreq.

Here, tesseract is augmented to produce a HTML report, including leptonica-style PIX images, all kept in a PIXA list. While producing the report, I inspect each PIX image collected in that list and blending it with the 'original input image' in such a way that the new PIX is the top layer, while the 'original input image' is used as a bottom layer (think Photoshop layers), where that bottom layer is tinged subdued red. The blend is a custom one (because I was unable to produce the same using one or only a few standard leptonica API calls; I blame my n00bness re leptonica usage); the visual end result is that where-ever the top layer is WHITE (or rather: "pretty bright"), the "original image, but red tinted", "shines through", so you can easily observe processing artifacts which you might not want or expect. (red was chosen as that is similar to what you see when working in Photoshop, using Quick Masks, etc.)

For this to work with my custom hand-written blender code, both layers must have the same dimensions, hence the top layer is scaled up to match the bottom layer, when this is necessary. So far, so good. There is a catch however:

Good: the new situation (using the new API pixScaleBySamplingTopLeft)

What we are looking at in the next screenshot is an extract of that custom tesseract diagnostic report with three images (note their reported width/height in pixels!)

The top image is the greyscaled "input" to a thresholding routine.

The second image is the thresholding mask produced by that routine: it's quite a bit smaller than the top input image, but I scale it to the same size as the original before rendering it as PNG, as part of the HTML output and for the reason described above: for this one, nothing is "shining through" but that's okay. What matters here, and WHY pixScaleBySamplingTopLeft is used and useful here is: I want (the user) to see how we got from top + middle (threshold mask) to the bottom one (third PIX) in one easy flow.

When you look at the bottom (third) image in the screenshot, you see "artifacts": the middle-bottom part is red-ish, because it got thresholded to white by way of those mask pixels, but that is probably undesirable as the original was pretty dark there, hence red-ish tint: the "locality" of the thresholding is failing us here. No problem (not for this pullreq anyway), but exactly the kind of thing I want to see: top + middle, mixed, produces bottom. ๐Ÿ‘

We see the thresholding algorithm at work:

msedge_good_crop

Bad: the previous situation (using the existing API pixScaleBySampling)

This requires a bit of an explanation, but TL;DR: let me show you what that pixScaleBySampling API call delivered, all else exactly the same:

here the scaled mask image (the middle one) looks "sampled" (good), but looks like it's oddly shifted by half a pixel left/up: pay particular attention to the reported pixel dimensions and keep in mind: top image mixed with this (sampled by thresholding algorithm) middle image produces bottom image. That does not look right! ๐Ÿค” โ“ Making this visual a headscratcher. Which is "bad":

msedge_bad_crop

Bad (v0.0.1) or how we got to use pixScaleBySampling in the first place: pixScale

Please remember: I'm a leptonica n00b, so I do my doc read, I do some RTFC (because the docs don't always link up properly with my brain), I do some grep, and hope to not embarrass myself too much. ๐Ÿ˜‰

This is what I got when I had "something working for the first time": I had decided pixScale was probably the initially-sanest answer to my quest re scaling up small images to "original image size": note those three reported images' width/height pixel dimensions in the screenshot again; nothing has changed, only this time I call pixScale as I did initially:

Please remember the goal here is to visualize: top mixed with middle (threshold level mask) produces bottom result somehow. Thanks to pixScale, the middle image gets scaled in a rather smooth fashion and to my dotard brain, as a user of those images I can't comprehend how the heck top + middle mixed makes bottom one: ๐Ÿ˜•

msedge_bad-old_crop

Which is the reason why I dug into leptonica some more after this first attempt and dove up pixScaleBySampling because I was looking for an upscaling that specifically DID NOT interpolate / smooth / otherwise-mix adjacent source pixels in the scaling up as the artifacts I was wondering about in the bottom image needed some explanation and my guess was that the crazy stuff I saw in the bottom ones (this "Dancing Troupe" is only one example and certainly not the "weirdest" of the bunch!) is possibly due to the sampling method used by the alleged thresholding algorithm.

Hence pixScaleBySampling popping up as prime candidate. ๐Ÿ‘

Which got me another WTF as I was now looking at some oddly-overlarge bottom pixel row in the mask and the next think was: why is this middle one suddenly shifted when I call pixScaleBySampling? What am I doing wrong?!

Which took some time, but landed me at "I don't know, but when I do this (pixScaleBySamplingTopLeft), my expectations match application output reality. ... ๐Ÿค” and maybe file a pullreq if a second RTFC doesn't tell me I've redone something that's already present anyway.

Sorry, couldn't find what I hoped to find, so here is pixScaleBySampling, duplicated and then corrected for my use case by dropping the two + 0.5 pixel position calculus expressions: that is the entire difference between pixScaleBySamplingTopLeft and pixScaleBySampling.


The End

I hope I didn't embarrass myself by completely overlooking leptonica API XYZ. ๐Ÿ˜…

PS: and that "Otsu thresholding algorithm"?! Code or it didn't happen!

std::tuple<bool, Image, Image, Image> ImageThresholder::Threshold(
                                                      ThresholdMethod method) {
  Image pix_binary = nullptr;
  Image pix_thresholds = nullptr;

  // ### useless/irrelevant code: *snip*

  auto pix_grey = GetPixRectGrey();

  int r = 0;
  l_int32 threshold_val = 0;

  l_int32 pix_w, pix_h;
  pixGetDimensions(pix_ /* pix_grey */, &pix_w, &pix_h, nullptr);

  if (tesseract_->thresholding_debug) {
    tprintf("\nimage width: {}  height: {}  ppi: {}\n", pix_w, pix_h, yres_);
  }

  if (method == ThresholdMethod::Sauvola) {
  // ### useless/irrelevant code: *snip*
  } else if (method == ThresholdMethod::LeptonicaOtsu) {
    int tile_size;
    double tile_size_factor = tesseract_->thresholding_tile_size;
    tile_size = tile_size_factor * yres_;
    tile_size = std::max(16, tile_size);

    int smooth_size;
    double smooth_size_factor = tesseract_->thresholding_smooth_kernel_size;
    smooth_size_factor = std::max(0.0, smooth_size_factor);
    smooth_size = smooth_size_factor * yres_;
    int half_smooth_size = smooth_size / 2;

    double score_fraction = tesseract_->thresholding_score_fraction;

    if (tesseract_->thresholding_debug) {
      tprintf("tile size: {}  smooth_size: {}  score_fraction: {}\n", tile_size, smooth_size, score_fraction);
    }

    // ### TADA! It wasn't me!   ;-)
    r = pixOtsuAdaptiveThreshold(pix_grey, tile_size, tile_size,
                                 half_smooth_size, half_smooth_size,
                                 score_fraction,
                                 (PIX **)pix_thresholds,
                                 (PIX **)pix_binary);
  } else if (method == ThresholdMethod::Nlbin) {
  // ### useless/irrelevant code: *snip*
  } else {
    // Unsupported threshold method.
    r = 1;
  }

  bool ok = (r == 0) && pix_binary;
  return std::make_tuple(ok, pix_grey, pix_binary, pix_thresholds);
}

... and then, through magic, that tuple lands in a PIXA, which I take to make an HTML, and we get the story above.

PPS: ๐Ÿค” hm, that half-pixel-to-the-top-left-SHIFT is everywhere?

Now that I write this pullreq, only now do I notice that that half-pixel SHIFT is also already present in the pixScale scaled-up output image. If you know -- with 20:20 hindsight -- what to look for: see the image dimensions, where the height is 3 pixels, and notice again that the 'linearly smoothed' pixScale-produced image also has a 'thinner' top row vs. a 'over-thick' bottom row, exactly like pixScaleBySampling, where it was so obviously visible: last screenshot repeated here for convenience: check the middle image (pixScale output):

msedge_bad-old_crop

Is this what we (you?) want, by design? Or is this an artifact that nobody's noticed up to now? Or... (fill in the blanks; n00b may be completely off his rocker.) โ“ ๐Ÿค”


Thanks for a very nice library; all misunderstandings/incomprehensions are mine. Compare this to RTFC-ing OpenCV, for example, and I know why I've been, รคh, "ambivalent" about working with & on that one, despite the lure of some desirable magic tech in there. At least I can grok this leptonica code and get results I want in a couple of weeks and still gaining speed (of coding). ๐Ÿ‘

DanBloomberg commented 1 year ago

A few comments.

(1) I don't see a visual difference in the binarized #06 images for the different cases. And I wouldn't expect it if you're just changing the sampling location within the input pixel array.

(2) pixScaleBySampling() is very fast and crude, with serious aliasing problems when downscaling. Use of low-pass filtered downscaling and interpolated upscaling functions is recommended for many situation.

(3) We wouldn't want to add functions to the library for small changes like yours.

(4) Otsu is a global thresholding function. Adaptive thresholding requires more computation but is often much better, and I use it whenever the output binarization quality is important.

(5) I appreciate that you're reading the code carefully and trying to make sense of it. Leptonica is a big library and it can be quite hard to know how best to use it. Some people have found this to be useful at the very highest level: http://www.leptonica.org/highlevel.html

GerHobbelt commented 1 year ago

AFK, apologies.

re your point 1: what I didn't explain properly in the context section is this:

(I hope this makes sense)

what I saw, and why i need that new API, is that apparently the main work\process otsu code (quoted in the pullreq message) does apply that tiny bitmap this way and, because I am producing a "augmented" diagnostic image sequence as an aside, need a scale method like that so I can show the state of the components used in the main process, "pixel perfect", so a human can easily look at it and "replay" or reason about what they see, in their head.

thanks for your other notes, I have to chew on them later, but the purpose of this whole exercise is trying to make sense of existing code in tesseract (mainline and dev branches\forks) while I'm trying to answer the question: "why the hell is tesseract going ape on me?" and finding current diagnostic assists (some rough & ready image dumping code) inadequate to answer that question.

and ignore this next bit if you don't like it, but that last part I wrote is where I notice, after readying the pr, that regular old-skool pixScale also does this "illogical" scale-up where top left source pixels effectively weigh half and bottom right edge pixels end up weighing in 1.5 in area-of-influence. what I call "shifted". because the scaling code does a "center of gravity of source pixel is in its center" mathematician mindset scaling, which works, theoretically, but is one of those dreaded off-by-one traps: upscaling pixels ABC to, say, 4 times, is (pixScale style): AABBBBCCCCCC (current output, smoothed of course), where it should be AABBBBCC, ie output size is source size 4 MINUS 1 (ie 4). the former is what I see pixScale doing (output size = source 4), while, if centre\centre is what you want, it should be the latter. Or one has to abandon centre\centre mindset and "move half a pixel".

which is what any "naive, fast" up sampling code out there does. and what is observed as done by that otsu call. so I needed a scale call to match that behaviour so I can have augmented diagnostics as a side channel.

'center is at top left of pixel' thinking turns ABC into (scale up by 4) AAAABBBBCCCC. That is what I see happening, looking at those otsu inputs and outputs (mask + result).

ok, sorry for my lack of using proper jargon, and this should be written up separately because it probably addresses something that I suspect is present in all(?) scaling as I see you call internals that do the real scaling work.

... wondering now how I can make my point clear and easier to read\grok (without spending a day on it; I try to be lazy. ;-). )

On Thu, Mar 9, 2023, 06:41 Dan Bloomberg @.***> wrote:

A few comments.

(1) I don't see a visual difference in the binarized #6 https://github.com/DanBloomberg/leptonica/issues/6 images for the different cases. And I wouldn't expect it if you're just changing the sampling location within the input pixel array.

(2) pixScaleBySampling() is very fast and crude, with serious aliasing problems when downscaling. Use of low-pass filtered downscaling and interpolated upscaling functions is recommended for many situation.

(3) We wouldn't want to add functions to the library for small changes like yours.

(4) Otsu is a global thresholding function. Adaptive thresholding requires more computation but is often much better, and I use it whenever the output binarization quality is important.

(5) I appreciate that you're reading the code carefully and trying to make sense of it. Leptonica is a big library and it can be quite hard to know how best to use it. Some people have found this to be useful at the very highest level: http://www.leptonica.org/highlevel.html

โ€” Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/pull/677#issuecomment-1461320644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADCIHQZE2GGO4I6Q2SK7STW3FUPJANCNFSM6AAAAAAVUNKE7U . You are receiving this because you authored the thread.Message ID: @.***>

DanBloomberg commented 1 year ago

Your point about the mistake in adding 0.5 for rounding in the scaling functions is valid. I should not have done that. The effect is typically minor, but in your case, where the source image is very small and the scaling factor is large, it can cause problems.

I have fixed the problem by modifying some of the sampling functions in scale1.c. You will need to call pixScaleBySamplingWithShift() or pixScaleBinaryWithShift(). Please download from head and see if this fixes your issue.

GerHobbelt commented 1 year ago

I just noticed your commit SHA-1: f068b48d81580cef1f2d7bafd59c482f2895ab43 when I did a quick pull & check as I got home late.

I think this is what I need, but I'd like to check my outputs to make sure. Can do that later tomorrow (it's late and I'm in Amsterdam timezone: 0230 hours right now); will report back tomorrow or saturday, at the latest. Intuition right now is that this change removes the need for this pullreq completely, but I am not 100% sure (late, tired).

Thank you for the quick work; I'll try to report back as soon as I can.

GerHobbelt commented 1 year ago

Tested your latest code: PASS! ๐Ÿ‘

Code snippet from my tesseract diagnostics output rendering code:

    else {
      int ow, oh, od;
      pixGetDimensions(original_image, &ow, &oh, &od);

      Image toplayer = pixConvertTo32(pix);
      Image botlayer = pixConvertTo32(original_image);

      if (w != ow || h != oh)
      {
        // smaller images are generally masks, etc. and we DO NOT want to be confused by the smoothness
        // introduced by regular scaling, so we apply brutal sampled scale then:
        if (w < ow && h < oh) {
          toplayer = pixScaleBySamplingWithShift(toplayer, ow * 1.0f / w, oh * 1.0f / h, 0.0f, 0.0f);
        }
        else if (w > ow && h > oh) {
          // the new image has been either scaled up vs. the original OR a border was added (TODO)
          //
          // for now, we simply apply regular smooth scaling
          toplayer = pixScale(toplayer, ow * 1.0f / w, oh * 1.0f / h);
        }
        else {
          // non-uniform scaling...
          ASSERT0(!"Should never get here! Non-uniform scaling of images collected in DebugPixa!");
          toplayer = pixScale(toplayer, ow * 1.0f / w, oh * 1.0f / h);
        }
      }

      auto datas = pixGetData(toplayer);
      auto datad = pixGetData(botlayer);
      auto wpls = pixGetWpl(toplayer);
      auto wpld = pixGetWpl(botlayer);
      int i, j;
      for (i = 0; i < oh; i++) {
        auto lines = (datas + i * wpls);
        auto lined = (datad + i * wpld);
        for (j = 0; j < ow; j++) {
          // if top(SRC) is black, use that.
          // if top(SRC) is white, and bot(DST) isn't, color bot(DST) red and use that.
          // if top(SRC) is white, and bot(DST) is white, use white.

and screenshot of output HTML (which is the result of the above pixScaleBySamplingWithShift code, executed for the Otsu mask PIX in the diagnostics PIXA, plus some scaffolding):

msedge_q6EpBNMvi9

๐Ÿ‘

GerHobbelt commented 1 year ago

Which closes/obsoletes this pullreq AFAIAC.

By the way: thanks for the documentation link! And the quick response + resolution, of course ๐Ÿ˜„

DanBloomberg commented 1 year ago

Great. Thanks for bringing up the issue in detail.

Dan