tests(pruning): Tests TopoStatsPrune and convPrune

AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images

https://afm-spm.github.io/TopoStats/

GNU Lesser General Public License v3.0

55 stars 10 forks source link

tests(pruning): Tests TopoStatsPrune and convPrune #848

Open ns-rse opened 1 month ago

ns-rse commented 1 month ago

Adds a smaller (30x30) test skeleton for testing pruning, this makes it easier to work with, however, this does raise a question as the resulting skeleton, generated by skimage.morphology.skeletonize() does have a T-shaped junction which it has been suggested never arise (more on this below).
Tests for TopoStatsPrune and convPrune attempt to demonstrate the effect of varying parameters has on pruning to which end the following parameters are varied...
- max_length
- height_threshold
- method_values
- method_outliers

There are some issues identified so far...

`convPrune`

As noted in the reason for the first paramterised test failing which arises when max_height = None (since int can not be compared to None) I think there is a problem with the default height being used here.

Line 454 tries to set the default max_branch_length to be 15% of the total number of elements in the skeleton array. self.skeleton.size counts the number of elements in an array (and because we are working with 2D images this will always be the product of the number of rows and columns). But the length of this will always be 1 since the total number of elements in the skeleton will always be an integer. Thus if max_length == -1 then the max_branch_length will always be 0 since we would always only ever get the number of elements in the array. That said in this dummy example we don't even get that, instead we get an error raised

max_length = -1
simple_array = np.asarray([[0, 0], [0, 0], [0, 0]])
total_points = simple_array.size
max_branch_length = max_length if max_length != -1 else int(len(total_points) * 0.15)

TypeError: object of type 'int' has no len()

Its also unclear why the convPrune._prune_by_length() method takes the argument max_length when its an attribute of the class which could be used instead.

This doesn't happen in the topostatsPrune class because the length of the molecule is based on the co-ordinates of the skeleton rather than the size of the array.

Excessive Complexity/Code duplication

The splitting of functionality here into multiple classes means there is some code duplication. Both topostatsPrune and convPrune have methods to _prune_by_length() conditionally based on the value of self.max_length not being None but they do so in different manners. Both methods then go on to called heightPruning() in the same manner.

The two classes could be combined into one with an option of whether to use the original length based pruning or the convolutional approach and call the appropriate method (renaming the method from convPrune to prune_length_by_conv() and the one from topostatsPrune to prune_length_by_neighbours(). In turn this would mean that there is no need to have the pruneSkeleton() factory class (and having that as a class in itself seems like overkill when a series of functions would suffice and address the complaints from Pylint on too-few-public-methods too).

Handling multiple grains

The refactoring done previously in #600 removed the loops from every method/function in the tracing module so that it only handled a single grain. The looping over of multiple grains is handled by code in process_scan.py, this means the prune_all_skeletons() methods can be removed too, further simplifying the code base.

These will be simple to remove in due course once we have robust tests in place.

T-shaped junctions

Previous work in #835 originally tested whether pruning.rm_nibs() "Remove single pixel branches (nibs) not identified by nearest neighbour algorithsm as there may be >2 neighbours" and the current tests show that these are still left by the pruning methods being tested, even though the last step is to re-run the plain skeletonisation method from skimage (Zhang's).

The following is a reproducible example that shows such a nib remains after pruning


import numpy as np
import matplotlib.pyplot as plt
from skimage import draw, filters
from skimage.morphology import skeletonize

def _generate_heights(skeleton: npt.NDArray, scale: float = 100, sigma: float = 5.0, cval: float = 20.0) -> npt.NDArray:
    """Generate heights from skeletons by scaling image and applying Gaussian blurring.

    Uses scikit-image 'skimage.filters.gaussian()' to generate heights from skeletons.

    Parameters
    ----------
    skeleton : npt.NDArray
        Binary array of skeleton.
    scale : float
        Factor to scale heights by. Boolean arrays are 0/1 and so the factor will be the height of the skeleton ridge.
    sigma : float
        Standard deviation for Gaussian kernel passed to `skimage.filters.gaussian()'.
    cval : float
        Value to fill past edges of input, passed to `skimage.filters.gaussian()'.

    Returns
    -------
    npt.NDArray
        Array with heights of image based on skeleton which will be the backbone and target.
    """
    return filters.gaussian(skeleton * scale, sigma=sigma, cval=cval)

def _generate_random_skeleton(**extra_kwargs):
    """Generate random skeletons and heights using skimage.draw's random_shapes()."""
    kwargs = {
        "image_shape": (128, 128),
        "max_shapes": 20,
        "channel_axis": None,
        "shape": None,
        "allow_overlap": True,
    }
    # kwargs.update
    heights = {"scale": 100, "sigma": 5.0, "cval": 20.0}
    kwargs = {**kwargs, **extra_kwargs}
    random_image, _ = draw.random_shapes(**kwargs)
    mask = random_image != 255
    skeleton = skeletonize(mask)
    return {"original": mask, "img": _generate_heights(skeleton, **heights), "skeleton": skeleton}

def pruned_plot(gen_shape: dict) -> None:
    """Plot the original skeleton, its derived height and the pruned skeleton."""
    img_skeleton = gen_shape
    pruned = topostatsPrune(
        img_skeleton["img"],
        img_skeleton["skeleton"],
        max_length=-1,
        height_threshold=90,
        method_values="min",
        method_outlier="abs",
    )
    pruned_skeleton = pruned._prune_by_length(pruned.skeleton, pruned.max_length)
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
    ax1.imshow(img_skeleton["original"])
    ax1.set_title("Original mask")
    ax2.imshow(img_skeleton["skeleton"])
    ax2.set_title("Skeleton")
    ax3.imshow(img_skeleton["img"])
    ax3.set_title("Gaussian Blurring")
    ax4.imshow(pruned_skeleton)
    ax4.set_title("Pruned Skeleton")
    plt.show()

def pruning_skeleton() -> dict:
    """Smaller skeleton for testing parameters of prune_all_skeletons()."""
    return _generate_random_skeleton(rng=69432138, min_size=15, image_shape=(30, 30))

pruned_skeleton(pruning_skeleton())

The paramterised test with id="Length pruning enabled (10), removes two ends leaving nibs on both." includes the resulting array after pruning and there is a two-pronged fork of nibs when using convPrune().

I think we need to ensure that remove_nibs() can handle such T-junctions as well as this dummy example demonstrates they are not correctly removed/pruned.

ns-rse commented 3 weeks ago

From @MaxGamill-Sheffield

Essentially convPrune was a way to use convolutions to make the pruning more intuitive and smaller code-wise compared to topostats prune - a side-by-side refactor if you will but I can’t remember if I ever finished it, or if it’s useful / upto date any more

In light of this I will undertake the following...

Remove convPrune class, it likely isn't used and as noted above possibly has an error in the _prune_by_length() method where the length of the molecule is calculated because the size of the whole 2D binary skeleton array is taken (in contrast to topostatsPrune._prune_by_length() which uses len(coordinates) of the skeleton).
Remove pruneSkeleton class and simplify to a simple set of functions implementing a "factory pattern" with a single factory method which calls topostatsPrune(), this leaves it in a state where extension will be possible if alternative pruning methods are to be implemented.
Remove tests that have been written for convPrune (under tests/tracing/test_pruning.py::TestConvPrune).

ns-rse commented 3 weeks ago

@MaxGamill-Sheffield Its not just renaming of variables that is causing the current test suite to fail, after doing some digging and scratching my head a lot, working out that we now need to pass all of the new options to dnaTrace() class in and doing so from the default_config() I found that the pruning step undertaken by get_disordered_trace() doesn't return the same pruned objects as previously.

I've therefore marked a bunch of tests in topostats/tracing/test_tracing_single_grain.py to be skipped until we can work out what parameter values from the existing code base (where most are hard coded) need to be set in the current configuration. There are probably other tests that fail but I'm taking one step at a time.

Its compounded by the hassle of having to take a list of co-ordinates and revert them to a 2D array to plot (I've left a note in 119b0d0 with some hacky code of how to do this as much for my own reference when I return to this later next week).

If you have any time/capacity to start investigating this it would be really useful. I was under the mistaken impression that you might have been using the existing test suite as you were refactoring and ensured that these passed or were updated as required (that is one of the major benefits of tests) but that isn't the case so we'll have to work through this now.

ns-rse commented 2 weeks ago

I've been looking at why many of the current set of tests in tests/tracing/test_tracing_single_grain.py fail and its because the call to get_disordered_trace() does return the same shape skeleton.

Passing two sets of heights

We've already identified and corrected a problem with getSkeleton() which was mistakenly being passed the original image and instead of the mask the image after it had been passed through Gaussian filter (see 84d0896). The gaussian_filter() method is applied to the heights (self.image line 295) and this smoothed image self.smoothed_grain was being passed into getSkeleton() along with the original self.image (call starts on line 465).

I've found the use of grain img and mask to be somewhat inconsistent and a possible source of confusion and as well as correcting the above problem so that an img and mask are passed have sought to use mask wherever a binary array is expected and img wherever heights are expected and remove grain from the equation.

Pruning not as expected

The get_disordered_trace() still doesn't return the pruned skeletons we expect based on the existing tests in tests/tracing/test_tracing_single_grain.py::test_get_disordered_trace. This only asserts that the size of the arrays are as expected and that the start and end co-ordinates are the same. Tests fail on checking the length as longer skeletons are now returned by the get_disordered_trace() method.

This function in the current refactored code calls in order...

    def get_disordered_trace(self):
        """
        Derive the disordered trace coordinates from the binary mask and image via skeletonisation and pruning.
        """
        self.skeleton = getSkeleton(
            self.smoothed_grain,
            self.mask,
            method=self.skeletonisation_params["method"],
            height_bias=self.skeletonisation_params["height_bias"],
        ).get_skeleton()
        self.pruned_skeleton = prune_skeleton(self.smoothed_grain, self.skeleton, **self.pruning_params.copy())
        self.pruned_skeleton = self.remove_touching_edge(self.pruned_skeleton)
        self.disordered_trace = np.argwhere(self.pruned_skeleton == 1)

There are eight parameterised tests as there are four skeletonisation methods (topostats, zhang, lee and thin) and two grains to be tested (circular and linear).

To investigate I've plotted the returned skeleton after pruning from the existing code on main and under this branch (maxgamill-sheffield/800-better-tracing / ns-rse/818-tests-pruning-topostats-conv).

NB - In light of the artifacts in the images on this branch not being clearly binary (yellow) plots I've double checked that the skeletons returned on this branch are in fact binary arrays and they are.

Method	Molecule	Branch
`topostats`	`linear`	`main`
`topostats`	`linear`	`maxgamill-sheffield/800-better-tracing`
`topostats`	`circular`	`main`
`topostats`	`circular`	`maxgamill-sheffield/800-better-tracing`
`zhang`	`linear`	`main`
`zhang`	`linear`	`maxgamill-sheffield/800-better-tracing`
`zhang`	`circular`	`main`
`zhang`	`circular`	`maxgamill-sheffield/800-better-tracing`
`lee`	`linear`	`main`
`lee`	`linear`	`maxgamill-sheffield/800-better-tracing`
`lee`	`circular`	`main`
`lee`	`circular`	`maxgamill-sheffield/800-better-tracing`
`thin`	`linear`	`main`
`thin`	`linear`	`maxgamill-sheffield/800-better-tracing`
`thin`	`circular`	`main`
`thin`	`circular`	`maxgamill-sheffield/800-better-tracing`

The newly introduced pruning_params are passed in to the prune_skeleton() call and I've double checked that in my refactoring to remove the excessive complexity that the **kwargs are correctly passed through the factory method to the topostatsPrune class (and they are).

I suspect some of these parameters though are different from the hard-coded method that is used on main.

The branches on the skeletons from main when using zhang / lee / thin methods are to be expected since pruning was not applied (only the topostats method prunes as it is called from within the getSkeleton class. This is performed by pruneSkeleton which does so by setting max_branch_length to be 15% of the length_of_trace (hard coded). Refactoring makes this configurable but default value of -1 used in configuration triggers this default to be set.

Repeated Pruning

Looking at the getSkeleton() class on main the self.doSkeletonising() method is called.

This method (see here) repeatedly calls the pruneSkeleton() method until while self.pruning: which may be a source of differences under the topostats method.

In the current refactored code this repeated pruning doesn't appear to occur. Skeletonisation and pruning have been separated out and within get_disordered_trace() (post correction mentioned above so that self.mask is passed to getSkeleton) the steps are...

getSkeleton(...).get_skeleton()
A single call to prune_skeleton()
remove_touching_edges()
get skeleton co-ordinates via np.argwhere().

Perhaps the reason we're not seeing the same skeletons is because only a single round of pruning is being undertaken? :thinking:

ns-rse commented 1 week ago

Looked at this more this morning and have found that in the current refactored when get_disordered_trace() the repeated pruning of branches is still in place within the pruning.pruneSkeleton._prune_by_length() method (see line 251 where the while pruning: loop starts

However, within this loop the max_branch_length is recalculated on each iteration (see line 259). This means that as branches are pruned the max_branch_length itself is reduced.

In contrast the main branch in the topostats.tracing.getSkeleton.pruneSkeleton() method max_branch_length is calculated once before and outside of the (see line 780 and remains constant throughout the subsequent loops as its not recalculated.

I think this might be the reason some of the branches are not pruned when running the test suite on the refactored code (branch maxgamill-sheffield/800-better-tracing from which ns-rse/818-tests-pruning-topostats-conv was made).

I shall be investigating further today.

ns-rse commented 1 week ago

Currently on maxgamill-sheffield/800-better-tracing and this branch ns-rse/818-tests-pruning-topostats-conv the max_branch_length is recalculated on each round of pruning.

In tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_test this means that on the first set of parameters the maximum allowed branch length (in terms of number of coordinates) reduces on each iteration and a total of three iterations are run...

INFO     topostats:pruning.py:282 [pruning] : Pruning by length.
INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 44
INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 29
INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 26

This isn't the behaviour in the current main branch (see above comment) and if we therefore calculate the max_branch_length once outside of the while pruning: loop we don't observe any reduction in the branch length but we still only see three iterations...

INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 45
INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 45
INFO     topostats:pruning.py:341 [pruning] : Maximum branch length : 45

Not sure why the length is one coordinate longer to start with :face_with_diagonal_mouth:

I can understand why the length might be reassessed after pruning so am unsure if this change is deliberate @MaxGamill-Sheffield

However, we still have not pruned all of the branches that the test expected to be pruned (perhaps as we only have three iterations still). The logic for controlling

ns-rse commented 1 week ago

Looking deeper into this I've found that the initial skeletonisation using the TopoStats method is perhaps the underlying cause of problems we are seeing with pruning.

NB is a branch off of `maxgamill-sheffield/800-better-tracing`	Molecule	Pruning Method	`main`	`ns-rse/818-tests-pruning-topostats-conv`
`linear`	`topostats`
`circular`	`topostats`
`linear`	`zhang`
`circular`	`zhang`
`linear`	`lee`
`circular`	`lee`
`linear`	`thin`
`circular`	`thin`

Okay this is strange, at the very least I would expect the Zhang/Lee/Thin methods to return the exact same initial skeletons, perhaps the masks that are being passed in are different.

Molecule	Method	`main`	`ns-rse/818-tests-pruning-topostats-conv`
`linear`	`topostats`
`circular`	`topostats`
`linear`	`zhang`
`circular`	`zhang`
`linear`	`lee`
`circular`	`lee`
`linear`	`thin`
`circular`	`thin`

This shows that the masks that the refactored code are passing in to each of the skeletonisation methods are different and would go some way to explaining why we are seeing different disordered traces which are the result of Skeletonisation and Pruning^[1].

Now it is worth noting the discovery above that found that a Gaussian smoothed image of heights was being passed into getSkeleton() as the second positional argument which should be a mask (see line 465).

Currently this has been switched to self.mask but in light of the above and comparing to the main branch its clear that the Gaussian filtered image (of heights self.smoothed_grain) should be converted to a smoothed mask (self.smoothed_mask ?) for passing to getSkeleton.

On the main branch the step that does this can be found on line 272 where...

        smoothed_grain = ndimage.binary_dilation(self.grain, iterations=1).astype(self.grain.dtype)

...and smoothed_grain is passed as the second argument to getSkeleton() on line 282 (note the very_smoothed_grain that is generated does not appear to be used anywhere).

Thus we need to apply ndimage.binary_dilation(self.mask, iterations=1).astype(self.mask.dtype) to the mask that is passed into getSkeleton() in the refactored branch.

Somewhat confusingly there is the dnaTrace.smooth_grains() method which is new and not currently tested in the existing test suite (it is used in the trace_dna() method to get the smoothed image but it assigns a binary mask to the self.smoothed_image which is conflating images and masks again. The existing tests that are under investigation do not use this method, rather they use the dnaTrace.gaussian_filter() method...

    dnatrace.skeletonisation_params["method"] = skeletonisation_method
    dnatrace.gaussian_filter()
    dnatrace.get_disordered_trace()

This method is applied to the self.image i.e. heights, but again confusingly this smoothed image is saved to self.smoothed_grain which is ambiguous so I have renamed it to self.smoothed_image and will introduce a self.smoothed_mask which will hopefully remove the ambiguity around what a grain is (as it looks like its sometimes image heights and sometimes binary masks).

Another and perhaps simpler approach might be to have getSkeleton() take only a single image, that of heights, and have it always convert this to a binary mask prior to skeletonisation. This would be simpler and reduce the complexity of passing lots of things around that need to be kept in sync.

^[1] : On the main branch when the method is topostats skeletonisation and pruning are combined since pruning is called directly after skeletonisation. On the main branch the zhang / lee / thin methods do not call pruning methods. In contrast the refactored branches pruning has been separated out of the getSkeleton class and as such skeletonisation is under taken and pruning is then performed on the resulting skeleton, regardless of method (i.e. topostats / zhang / lee / thin skeletonisation are all followed by pruning).

ns-rse commented 1 week ago

Having now tested the above and ensured that ndimage.binary_dilation() is applied to the mask and that this dilated mask is passed in the refactored code now passes the same mask into the skeletonisation methods.

Linear

`main`

main_topostats_linear_mask

`ns-rse/818-tests-pruning-topostats-conv`

refactored_linear_using_dilated_mask

Circular

`main`

main_zhang_circular_mask

`ns-rse/818-tests-pruning-topostats-conv`

refactored_circular_using_dilated_mask

This is a step closer, the tests fail as there are differences in the number of points in the disordered trace but for the zhang / lee / thin mtehods this is to be expected because pruning is now applied to the skeletons returned by each of these methods and as we can crudely see in the outuput below based on the failed tests the number of points has reduced for these methods. What we need to investigate now is why the topostats method also reduces the number of points.

====================================================== short test summary info ======================================================
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[linear molecule, skeletonise topostats] - assert 125 == 120
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[circular molecule, skeletonise topostats] - assert 143 == 150
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[linear molecule, skeletonise zhang] - assert 122 == 170
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[circular molecule, skeletonise zhang] - assert 149 == 184
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[circular molecule, skeletonise lee] - assert 151 == 177
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[linear molecule, skeletonise thin] - assert 118 == 187
FAILED tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace[circular molecule, skeletonise thin] - assert 175 == 190
============================================== 7 failed, 1 passed in 224.82s (0:03:44) ==============================================

ns-rse commented 1 week ago

Looking in detail at the final skeletons that are produced by the refactored code now that the skeletonisation method is passed a binary dilated mask and comparing these to the skeletons returned on the main branch.

The diff column shows the differences between the two branches (apologies for these not being to scale, its a quick hack to look at differences)

Molecules	Method	`main`	`ns-rse/818-tests-pruning-topostats-conv`	`diff`
`linear`	`topostats`
`circular`	`topostats`
`linear`	`zhang`
`circular`	`zhang`
`linear`	`lee`
`circular`	`lee`
`linear`	`thin`
`circular`	`thin`

We can see that for the zhang / lee / thin methods there is as suggested above an improvement as the branches which were not pruned on main since pruning was a method of the getSkeleton() class are now removed on branch ns-rse/818-tests-pruning-toposats-conv as @MaxGamill-Sheffield has refactored pruning into its own class and it is applied after skeletonisation.

This is an improvement and I would be happy to update the tests in light of these changes as from the diff plots show the underlying skeletons have simply been pruned.

For the topostats method though there are subtle changes in the final skeleton. The linear one has a slightly larger loop in the bottom right hand corner and a kink on the long arm near the top left. Differences in the circular molecule are harder to detect.

Would be useful to hear your thoughts on what might underpin these changes @MaxGamill-Sheffield and whether what is currently returned is acceptable before I dig deeper. I do appreciate you undertook this work some time ago.

I'm going to go through my current branch and remove reference to grain and ensure that the terms image and mask are instead consistently used to remove any and all ambiguity about what is being passed around/used at different steps.

MaxGamill-Sheffield commented 5 days ago

Sorry for the late reply here are my comments:

Yes the input masks to the skeletonisation algorithm are different - this is because we found that topological features such as holes would be removed by the iterative binary dilation. As a result, this line in main was replaced with the smooth_grains() func which takes a grain (binary), dilates or blurs it and returns a binary mask.
The variable smoothed_grains is in fact a mask, not an image. Or at least it should be. None of the variables used in the smooth_grains() func are images.
The reason for the non-smoothing of the grains might be the terribly low sigma value. Previously, the grains were not passed in 1-at-a-time so the gaussian_sigma value found using the hardcoded max(grain.shape) / 256 which is expected to be > 1 but on reflection this might not be the case causing the image not to be smoothed, and skeletonisation failing.
At least in my head "grain" was always used as an internal name and interchangeably for masks - I agree though consistency is key here so just using masks would be better.

ns-rse commented 4 days ago

Thanks for having a look @MaxGamill-Sheffield

The existing test suite which is where a bunch of tests all fail and I'm currently investigating break down the processing steps and test individually and sequentially (so we can identify at what point breakages occur when undertaking refactoring)...

dnatrace.gaussian_filter()
dnatrace.get_disordered_trace()
<optional additional steps if required>

...as that was how the workflow ran when the tests were written and because gaussian_filter() smooths an image and not a binary mask its why the tests weren't working.

It's fine to have introduced dnatrace.smooth_grains() method as an alternative but the tests need updating when such refactoring is undertaken so that they a) actually use the alternative method (i.e. smooth_grains() instead of gaussian_filter() ; b) account for the changes that these may have on the test results (its ok for test results to change when we know we have modified the underlying methods).

The variable smoothed_grains is in fact a mask, not an image. Or at least it should be. None of the variables used in the smooth_grains() func are images.

That might be true in your refactoring of dnaTrace.trace_dna() (lines 170-190 of 84d08966) but in that same commit there was, as we discussed, a renaming of self.gauss > self.smoothed_grain within the gaussian_filter() function when a Gaussian blur was applied to self.image (i.e. the original height images and not a mask).

The unit-tests are failing at tests/tracing/test_dnatracing_single_grain.py::test_get_disordered_trace and this is because dnatrace.gaussian_filter() is still being used which blurred the original image and didn't dilate the mask as you say smooth_grain() now does. In turn the call to dnatrace.get_disordered_trace() receives image = self.image and mask = self.smoothed_grain where self.smoothed_grain is image heights and not a binary mask. Thus even though in dnaTrace.trace_dna() the call to .get_disordered_trace() is preceeded by smooth_grain() rather than gaussian_filter() the tests failed as they hadn't been adapted to account for this change.

The reason for the non-smoothing of the grains might be the terribly low sigma value. Previously, the grains were not passed in 1-at-a-time so the gaussian_sigma value found using the hardcoded max(grain.shape) / 256 which is expected to be > 1 but on reflection this might not be the case causing the image not to be smoothed, and skeletonisation failing.

I've not got to checking the tests for this yet but looking at the above mentioned PR the gaussian_sigma value is initially hard coded during class instantiation as...

self.sigma = 0.7 / (self.pixel_to_nm_scaling * 1e9)

...(both on main and maxgamill-sheffield/800-better-tracing) and it looks, as far as I can tell, that this value would have been used on main. In maxgamill-sheffield/800-better-tracing though this value isn't passed as an argument to the call to self.smooth_mask() (as its coming from the **kwargs of the parameters defined in topostats/default_config.yaml) and it is therefore set to max(mask.shape) / 256 or the user supplied value (default null and so default is to use the maximum mask dimension.

We can likely therefore do away with the line...

self.sigma = 0.7 / (self.pixel_to_nm_scaling * 1e9)

...as it is only used in the gaussian_filter() method which by the sounds of it is redundant as it has been replaced by smooth_grains().

I'll work on tidying this up and removing redundant code and getting the tests back on track.

MaxGamill-Sheffield commented 3 days ago

X-post from slack: Hey @ns-rse , these functions (gaussian_filter() and smooth_grains()) aren’t the same thing, they work on different inputs and also solve different problems: • Gaussian blurring of the image in main was also hugely beneficial when we had the “fitted trace” step which attempted to shift the unordered coordinates onto the molecule backbone, a step which is now done using the height-biased skeletonisation method. • Smoothing of the masks in 800-better-tracing incorporates the grain smoothing that is done in main’s get_disordered_trace() to combat the production of spurious branches when skeletonising. We now do both a dilation and blur and see which one has affected the least amount of pixels and choose that as the smoothed mask. The “very_smoothed_grain” in main which blurs the smoothed grain is not actually used.

My guess for why the new smooth_mask() function no longer operates as expected and gives different results is because the hardcoded sigma value (! not deffine at init but in smooth_grains()!) is too small due to the new input of a grain with a shape smaller than the older input in the cats branch which was an image, and thus nothing is smoothed.

ns-rse commented 3 days ago

Hi @MaxGamill-Sheffield

Commits reverted so gaussian_filter() method and get_ordered_trace() are reinstated (although tests fail).