brandonmpetty / Doxa

A Local Adaptive Thresholding framework for image binarization written in C++, with JS and Python bindings. Implementing: Otsu, Bernsen, Niblack, Sauvola, Wolf, Gatos, NICK, Su, T.R. Singh, WAN, ISauvola, Bataineh, Chan and Shafait.
https://brandonmpetty.github.io/Doxa/WebAssembly
Creative Commons Zero v1.0 Universal
164 stars 37 forks source link

pseudo F-measure, single metrics in Doxapy, and `eps` for numerical stability? #32

Closed DiTo97 closed 1 year ago

DiTo97 commented 1 year ago

Hi @brandonmpetty,

Thanks for the awesome package. I have been experimenting with different neural networks architectures for document image binarization (DIBCO), and after trying multiple implementations of DIBCO metrics, yours are much faster, accelerating the training loop (using Hugging Face transformers' trainer API) by a few orders of magnitude.

Below I will leave a few open points or requests, but let me know if you want me to open separate issues:

import numpy as np
import numpy.typing as np_typing

import scipy

Bitmap = np_typing.NDArray[np.bool_]

G123_LUT = np.array([
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
    0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0
], dtype=bool)

G123P_LUT = np.array([
    0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
], dtype=bool)

def bwmorph_thin(bitmap: Bitmap, num_iters: int = -1) -> Bitmap:
    """The bwmorph thinning algorithm 

    For more information on the algorithm, see 
    https://it.mathworks.com/help/images/ref/bwmorph.html#bui7chk-1
    """
    if bitmap.ndim not in [2, 3]: 
        raise ValueError("The bitmap must be a 2D array or a batched 3D tensor")

    if not np.all(np.in1d(bitmap.flat, (0, 1))):
        raise ValueError("The bitmap contains values other than 0 and 1")

    if num_iters <= 0 and num_iters != -1:
        raise ValueError("num_iters must be > 0 or equal to -1")

    bitmap = np.array(bitmap).astype(np.uint8)

    batched3d = bitmap.ndim == 3

    if not batched3d:
        bitmap = np.expand_dims(bitmap, 0)

    # The neighborhood kernel
    kernel = np.array([
        [ 8,  4,   2],
        [16,  0,   1],
        [32, 64, 128]
    ], dtype=np.uint8)

    finished = np.zeros(bitmap.shape[0], dtype=bool)

    batch_size = bitmap.shape[0]
    num_pixels_before = np.sum(bitmap, axis=(1, 2))

    while num_iters != 0:
        # The two subiterations
        for lut in [G123_LUT, G123P_LUT]:
            for idx in range(batch_size):  # It is faster than the batched operation
                if finished[idx]:
                    continue

                N = scipy.ndimage.correlate(bitmap[idx], kernel, mode="constant")
                D = np.take(lut, N)

                bitmap[idx][D] = 0

        num_pixels = np.sum(bitmap, axis=(1, 2))

        finished = num_pixels == num_pixels_before

        if np.all(finished):
            break

        num_pixels_before = num_pixels

        num_iters -= 1

    if not batched3d:
        bitmap = np.squeeze(bitmap, axis=0)

    return bitmap.astype(bool)

def pseudo_fmeasure(references: Bitmap, preds: Bitmap, eps: float = 1e-6, **kwargs) -> np_typing.NDArray[np.float_]:
    """The pseudo F-measure metric"""
    neg_references = 1 - references
    neg_preds = 1 - preds

    skeletons = bwmorph_thin(neg_references, **kwargs).astype(np.uint8)

    tpositives = neg_preds * neg_references
    fpositives = neg_preds * references

    num_tpositives = np.sum(tpositives, axis=(1, 2))
    num_fpositives = np.sum(fpositives, axis=(1, 2))

    precision = num_tpositives / (num_fpositives + num_tpositives + eps)

    pseudo_tpositives = neg_preds * skeletons
    pseudo_fnegatives = preds * skeletons

    num_pseudo_tpositives = np.sum(pseudo_tpositives, axis=(1, 2))
    num_pseudo_fnegatives = np.sum(pseudo_fnegatives, axis=(1, 2))

    pseudo_recall = num_pseudo_tpositives / (num_pseudo_fnegatives + num_pseudo_tpositives + eps)

    pseudo_nume = 2 * (precision * pseudo_recall)
    pseudo_deno = precision + pseudo_recall + eps

    pseudo_score = pseudo_nume / pseudo_deno
    return pseudo_score
brandonmpetty commented 1 year ago

@DiTo97, thank you for the feedback.

I am really glad you found the library useful. Pseudo F-Measure is the biggest gap this library has for metrics, as you pointed out.

Your code seems pretty simple, basically counting skeletons... but when I read “Performance Evaluation Methodology for Historical Document Image Binarization", it seemed far more complex than that (like these papers usually try to do). I will definitely look more into this.

As for the divide by zero issues... I think you are right. I think those exist in the metrics code. I will leave this Issue open and try to address those.

As for breaking up the metrics in python, also possible. To be honest with you, I have two other projects that are taking a lot of my attention. One is a new Capture The Flag cyber security game I have been working on for about a year and should be release at the end of this year. The second is adding AdOtsu to this library and in addition to that, maybe a slight refactor that I think will make it even faster and easier to use for metric analysis.

Let me look more closely into Pseudo F-Measure... because if I can knock that out, it would be amazing. I am not really aware of any open source utility that can handle all of the dibco metrics. In fact, I would be curious what you found out there... who is even trying? The DRD Metric kills most people (I was the only one with a real open source implementation when I published it). I try to implement everything based directly off the papers. Lol, if you want a laugh ask ChatGPT how to implement pFM.

DiTo97 commented 1 year ago

Hi @brandonmpetty,

Thanks for the prompt response. I will reply in points one by one:

Your code seems pretty simple, basically counting skeletons... but when I read “Performance Evaluation Methodology for Historical Document Image Binarization", it seemed far more complex than that (like these papers usually try to do). I will definitely look more into this.

As for the pseudo F-measure implementation, it's not really mine, since I took it from the metrics.py module implemented by SauvolaNet's authors, which in turn, was inspired by a few open-source gists with DIBCO metrics implementations (will share the link as soon as I find it again). After reading the paper and comparing what the metric was measuring as opposed to the standard F-measure, I was convinced that it was doing a somewhat reasonable thing. Again, though, I did not put too much thought into it, as I was more interested in a direct comparison of my model with SauvolaNet, regardless of whether their metrics were implemented the right way. I just polished them a bit, added eps for numerical stability, and ensured they could take a batch of samples rather than pairs, turning matrix ops to tensor ops where possible.

As for the divide by zero issues... I think you are right. I think those exist in the metrics code. I will leave this Issue open and try to address those.

Yep, and should be quite simple to add to the C++ codebase. In fact, the only "issues" I had using your metrics were the random NaNs from the F-measure when the model was noisy at the beginning, or a few unlucky NaNs from the PSNR in the end. I could even try to add them myself with a very small PR, but I am not very experienced using Pybind.

As for breaking up the metrics in python, also possible. To be honest with you, I have two other projects that are taking a lot of my attention. One is a new Capture The Flag cyber security game I have been working on for about a year and should be release at the end of this year. The second is adding AdOtsu to this library and in addition to that, maybe a slight refactor that I think will make it even faster and easier to use for metric analysis.

That seems like a great project! I took part in quite a few CTF challenges during my uni days. As for refactoring, I am also a refactor-first developer and if you see room for potential improvement in the library (speed being the key), you should definitely go for it. Nonetheless, I think that, unless we are talking orders of magnitude, those quality of life updates that I suggested may really make it easier to use from a broader perspective. As you noted in the last point, there aren't many libraries out there that implement the whole suite of DIBCO metrics, besides the repositories of the SoTA binarization models, which is a shame. Therefore, your library already provides great benefits to the community. SauvolaNet's authors' code takes almost a full minute (around 50 secs) for a batch eval of 16 pairs of samples, mostly due to DRD and pseudo F-measure, which made up for a huge bottleneck in my training loop (most of that code cannot even be JIT compiled with, say, Numba or the likes, because it uses some Numpy functions that have not been ported yet). For comparison, yours takes only a few hundred millisecs, which has been a far greater improvement than losing the pseudo F-measure has been a loss.

if you want a laugh ask ChatGPT how to implement pFM.

I will, I would never want to make our dear ChatGPT feel a bit too lonely

brandonmpetty commented 1 year ago

Ok, I did some more research into this. This is typical with Gatos, et al. He comes out with something in one paper, then perfects it in another paper years later and calls it the same thing. There are actually two pseudo-f measure algorithms. The one you have coded up, as referenced in the H-DIBCO 2010... which references another paper he did back in 2008. Then you have this new pseudo-f measure algorithm, which takes the 2008 algorithm and really double-downs on the complexity. He did this with his binarization algorithm too. His new algorithm was basically the old one with a ton of post-processing in it. There are two different versions of AdOtsu as well, just to make a point that this is fairly common.

All that to say, when I come out with this first version it will still not be on par with modern DIBCO metrics. And when v2 comes out, the added complexities will be killer on performance. I think that is partially why they deliver pre-generated pFM weights.

DiTo97 commented 1 year ago

Thank you for the effort and the findings Brandon!

Of course, I was oblivious to this, but it is a very interesting pattern. The question is if the psuedo F-measure is such a more insightful metric than the others that it justifies the additional complexity. Thus far, my answer is no, it still has decent correlation with the F-measure and I would much rather train the models in a quarter of the time while prototyping.

brandonmpetty commented 1 year ago

Ok, I have a fix almost ready that will resolve the divide by zero issues for the performance metrics. I will look into adding an option for calling each metric. It may not be a new call for each one, but more like one method that will take in metrics-mask, letting you specify which ones you want.

I am also completely revamping this project to use a new VS2022 solution set and Google Test. That way the library can be tested across multiple platforms. All tests are passing, I just have to do a little more validation work and I'll push a PR for you to review.

DiTo97 commented 1 year ago

Awesome job. Thank you!

Let me know when the PR is ready for review!

brandonmpetty commented 1 year ago

@DiTo97, I have a PR that will include the changes you requested (https://github.com/brandonmpetty/Doxa/pull/33) If you do a local build on this branch, you should see a new function: calculate_performance_ex It takes in optional named parameters like "drdm=True". I have a Notebook that shows how to use it.

I also fixed the divide by zero issues. You might check out: https://github.com/brandonmpetty/Doxa/pull/33/files#diff-c8f168da181d429f3a83fa14c1ef1efabb403432a8c2c3cd59697bff66110d3c

That link points specifically to the changes you are looking for. Sorry for grouping them in with a bunch of other changes, but hopefully it helps.

DiTo97 commented 1 year ago

Thanks a lot @brandonmpetty!

I am on vacation right now, will test the PR on a local build when I am back next week.

brandonmpetty commented 1 year ago

Merged it. If you see anything that looks out of place, just let me know.