AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
55 stars 10 forks source link

tests(utils/pruning): Tests for utils and some of the pruning classes #835

Closed ns-rse closed 1 month ago

ns-rse commented 1 month ago
MaxGamill-Sheffield commented 1 month ago

I noticed that this branch didn't run topostats (not due to these changes). I've made a fix back on better-tracing if you want to pull those changes into here / rebase.

Other than that. I've started on rm_nibs first noticing that the initial test array has corners and thus aren't skeletons - this is what's causing breaks in the "circular with single 1-pixel branch internally" result. Other than that, it seems that handling of T-nibs rather that X-nibs are not handled so I'm looking into this.

MaxGamill-Sheffield commented 1 month ago

Just double checked and T-nibs also don't follow the skeletonisation criteria. See below:

Using scikit-image's skeletonisation function, this can be skeletonised further:

Screenshot 2024-05-08 at 16 41 03

Whereas this can't, but the current code works

Screenshot 2024-05-08 at 16 42 35

Will replace the T-arrays with X arrays which fit the skeletonisation criteria. Let me know if you think of others which do fit the skeletonisation criteria that aren't just X's

ns-rse commented 1 month ago

The purpose of this...

Screenshot 2024-05-08 at 16 41 03

...was to construct an image where skeletonisation had left that sort of "nib" and test if it was removed by the process as I was expecting the pruning.rm_nibs() function. The docstring states it should...

Remove single pixel branches (nibs) not identified by nearest neighbour algorithms as there may be >2 neighbours.

...which is why I constructed that as a test case, the "nib" has three neighbours and pruning.rm_nibs() doesn't remove it. Thus either the algorithm doesn't do as is purported or the docstring needs adjusting to clarify what the function does actually do.

I would have to investigate whether any of the skimage methods zhang/lee/medial_axis (which from memory made some skeletons that we would not want and I think I removed at some stage)/thin or topostats ever leave such "nib" artifacts and that would take considerable time and effort and I'd have to write a lot of tests, but its pretty straight-forward to construct an artificial example (single-pixel width skeleton with point with three neighbours) and see if rm_nibs() does what it purports to do.

But this is kind of the point of having small atomic functions that do one thing and do it well and are robustly tested against possible edge cases as they are then not relying on other parts of code to not do something. If you build your software like that then removing/changing one component can have a cascading effect and you have to change code in lots of other areas making development and maintenance more of a headache.

It would be good if rm_nibs() removed points such as these rather than assuming they will never be seen as we don't know what sort of weird shapes/structures might be thrown up at some point, even if we haven't seen them yet.

MaxGamill-Sheffield commented 1 month ago

was to construct an image where skeletonisation had left that sort of "nib"

By definition, skeletonisation algorithms cannot leave these kinds of "nibs", else it's not a skeletonisation algorithm. I have confirmed using the skimage morphology functions that this "nib" is removed by all types of skeletonisation methods we support. (the topostats method has a final step of using zhang skeletonisation exactly for this):

Screenshot 2024-05-09 at 09 34 25

I will update the docstring to say "Remove single pixel branches (nibs) from skeletons not identified by nearest neighbour algorithms as there may be >2 neighbours.", but feel free to adapt this further to how you see fit.

If what's provided to pruning is not a skeleton and more a mask, it'll never work. Even without pruning, the tracing will never work as it always assumes a single pixel trace, and T-nibs and the corner points in your tests also makes these more akin to masks - which is why we've skeletonised in the first place. The solution to this would be to enforce skeletonisation in the pruning methods, and then we don't have atomically small functions.

ns-rse commented 1 month ago

Yep, I checked that the four methods in getSkeleton would remove the nib and corner, but as I wrote, I based the tests on the docstrings which were that a point could have > 2 neighbours.

Updating the docstring to clarify what the function does is fine and important. I'll check it tomorrow, busy today with other work, but I noted in the reason for tests failing that skeletonisation also appeared to be applied during this step as corners were removed from the ring so we need to be clear about everything that is done in this step.

We should probably implement a way of checking that a skeleton has been passed and exit if that is not the case.

MaxGamill-Sheffield commented 1 month ago

So in my commits I've:

We should probably implement a way of checking that a skeleton has been passed and exit if that is not the case.

Off the top of my head the only way I can think to do this is to skeletonise it with one of the scikit-image methods and check for a difference. But again I don't think this is worth the compute or the time to write in, just make it clear that things should be skeletons which I think we've done- and we've even defined what skeletons are in the docstrings > "single pixel traces" and used "skeletons" as parameter names opposed to "mask" or "grain" showing it's a different binary structure.

MaxGamill-Sheffield commented 1 month ago

Just added the numpydoc-validation failure fixes to 800-better-tracing branch and rebased this so now the pre-commit test should pass.

Next steps for this PR are for @ns-rse to look over the test_local_area_sum tests then it can be merged.

ns-rse commented 1 month ago

Which way has the rebase been done @MaxGamill-Sheffield ?

The ns-rse/818-tests is meant to be merged into maxgamill-sheffield/800-better-pruning (that is the purpose of this PR). It doesn't make sense to fix things on maxgamill-sheffield/800-better-pruning and rebase ns-rse/818-tests-pruning onto it when the tests are going to be merged in by this PR. Also it doesn't make sense to rebase your branch onto this as that is what the pull request is for.

ns-rse commented 1 month ago

Next steps for this PR are for @ns-rse to look over the test_local_area_sum tests then it can be merged.

Gone through this, the tests I've written are as I've intended.

To explain further the original local_area_sum() function looked like...

def local_area_sum(binary_map: npt.NDArray, point: list | tuple | npt.NDArray) -> tuple:
    """
    Evaluate the local area around a point in a binary map.
    Parameters
    ----------
    binary_map : npt.NDArray
        Binary array of image.
    point : list | tuple | npt.NDArray
        Coordinates of a point within the binary_map.
    Returns
    -------
    tuple
        Tuple consisting of an array values of the local coordinates around the point and the number of neighbours
        around the point.
    """
    x, y = point
    local_pixels = binary_map[x - 1 : x + 2, y - 1 : y + 2].flatten()
    local_pixels[4] = 0  # ensure centre is 0
    return (local_pixels, local_pixels.sum())

This was prone to problems if the point selected was on the edge of array such as the following...

0 1 0    0 0 0    0 0 1    0 0 0    0 0 0
0 0 0    1 0 0    0 0 0    0 0 0    0 0 0
0 0 0    0 0 0    0 0 0    0 1 0    0 0 1

...as taking slices of and flattening them (binary_map[x -1 : x +2, y - 1: y+2].flatten()) could result in arrays that were < 9 in length and the sum of these would always be less than the maximum possible of 8.

Thus I introduced try: ... except: to capture firstly if a point is on the left or top of an array since you can't take negative indexes. But this didn't capture when points are on the right or bottom of an array and so there is a check made to ensure that the shape of the flattened array has a length of 9, if not and IndexError() is raised.

Finally another check is made to ensure the sum of the surrounding points is always <= 8 if not then the supplied array is not binary and so a ValueError is raised.

Background done now an explanation of the tests.

test_local_area_sum()

Tests the function works as expected, it includes a number of parameters where the above errors are expected to be raised and so these carry marks=pytest.mark.xfail() which allows them to fail.

But we want to test why they have failed and so there then more parameterised tests that do this.

test_local_area_sum_value_error()

Tests that ValueError is raised if the point in question or any of its neighbours are not 0 or 1.

test_local_area_sum_index_error()

Tests that IndexError is raised if the point in question is on the edge.

Conclusion

None of these need changing, they are testing what I intended them to.