AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
57 stars 10 forks source link

tests: Adapts get_skeleton() tests to the getSkeleton() class #822

Closed ns-rse closed 5 months ago

ns-rse commented 5 months ago

First step in adding tests to the maxgamill-Sheffield/800-better-tracing.

Tests

IMPORTANT

I noticed whilst developing the tests that the tests on a circular molecule with height_bias = 0.6 results in a strange skeleton. The ring of the circle is correct but there is a branch in the bottom right (search tests/tracing/test_skeletonize.py for TopoStats, circular, height_bias 0.6 and the array that is produced is above this).

This probably isn't what is expected but is worth giving some consideration to now because if such artifacts are produced on simple test cases its quite possible that we will get unwanted artifacts in more complex situations.

Additional Linting

Because I modified topostats/tracing/skeletonize.py it was linted as part of the pre-commit checks which include ruff, pylint and now numpydoc-validation. I have therefore linted this file and...

Some of the disabled checks (line numbers are rough but should get you close)

Ruff

topostats/tracing/skeletonize.py:597:34: B006 Do not use mutable data structures for argument defaults
topostats/tracing/skeletonize.py:775:9: C901 `_prune_by_length` is too complex (11 > 10)

Pylint

topostats/tracing/skeletonize.py:1:0: C0302: Too many lines in module (1399/1000) (too-many-lines)
topostats/tracing/skeletonize.py:17:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:187:0: R0902: Too many instance attributes (13/7) (too-many-instance-attributes)
topostats/tracing/skeletonize.py:595:4: W0102: Dangerous default value {} as argument (dangerous-default-value)
topostats/tracing/skeletonize.py:567:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:768:36: E1121: Too many positional arguments for method call (too-many-function-args)
topostats/tracing/skeletonize.py:773:4: R0914: Too many local variables (18/15) (too-many-locals)
topostats/tracing/skeletonize.py:685:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:958:36: E1121: Too many positional arguments for method call (too-many-function-args)
topostats/tracing/skeletonize.py:875:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:1179:4: R0914: Too many local variables (16/15) (too-many-locals)
topostats/tracing/skeletonize.py:1335:0: R0914: Too many local variables (19/15) (too-many-locals)

Some hopefully useful comments on how to resolve some of these...

More importantly its is considerably easier to address these during development/writing of code and not at the end in a big "clean-up". We have pre-commit hooks in place and should use them even if it is tempting to bypass them.

ns-rse commented 5 months ago

Adding comment from commit here for greater prominence so I don't forget what has been disabled.

See log

Also threw a bunch of other things which need addressing, some have been disabled for now others addressed.

MaxGamill-Sheffield commented 5 months ago

In regards to the unusual skeleton test, it looks a bit weird but technically it is not a problem for our work.

Due to many pixels at the same low value (1), not all of these are removed on the first pass of the skeletonisation algorithm. Then on the second pass there is still enough 1's to fully remove them using the height bias. As when using np.argsort the order of groups of 1's doesn't change, meaning the "last" 1-pixel will only be removed after N passes. Sometimes this means that pixel can’t be removed due to not fitting the skeletonisation removal conditions.

See the following images which isolates the bottom right corner pixel and the associated output statement of: sub-iteration, # total points to remove, # hight bias points to remove, height values and counts. 0-mask 1-a 1-b Screenshot 2024-04-04 at 23 09 50

A fix then is to shuffle the order of each unique value in the array to remove so that the lowest hight pixel may not always lie outside the bias bounds. This gives the following 4 results. 0-mask (1) 1-a (1) 1-b (1) Screenshot 2024-04-04 at 23 14 57

Although this is not an issue for our data (pixels unlikely to have the same value) do you recommend adding the function? and if so how shall I add it? to my working branch and you rebase from it?

@ns-rse Reply:

Is there a better example of heights that can be used to test this then?

I think the test is good and the added shuffle function should help

It seems like a regression/step-backwards compared to the Zhang method if its leaving a branch and why just on that corner as the shape is symmetrical.

Branch is not ideal and is an artifact of the algorithm. It's only on that side because that index in the array is the largest for the 1's at [18, 18].

I'd be wary of shuffling order from scanning through the code (not gone it in detail yet as I've focused just on the new class to do the skeletonisation, things are ordered in various places. I'd want to investigate the wider impact of doing so (hence why we need tests!).

I think the test has captured the need for this shuffle function which is ace and I don't see it's affect creeping up much elsewhere other than possibly producing different structures based on the shuffle - maybe we set a seed for this?

ns-rse commented 5 months ago

@MaxGamill-Sheffield the problem was that I hadn't updated the calls in dnatracing.py and nodestats.py to work with the revised getSkeleton() which in this PR removed the use of a dictionary as an argument (see above comments about these raising linting errors). I'd also made a minor mistake in addressing another linting issue and removed a command added a space.

Process minicircle.spm ok locally.

Noticed also that the default_config.yaml and been altered to use atst/ directory for input and output will have to remove that before things go into main.

ns-rse commented 5 months ago

This PR puts tests in place (and lints some of the touched files), its not meant to be a branch for development of features or fixes for errors that the tests highlight.

I'm not convinced that the topostats method isn't a regression given the way it produces s spurious branch and have concerns that this may cause problems further down the line but further unit testing of these methods rather than the broad testing of the class which is implemented here might help resolve this and @MaxGamill-Sheffield has thoughts on how to address this anyway (which if implemented on maxgamill-sheffield/800-better-tracing branch after merging can also update the relevant test).

ns-rse commented 5 months ago

Is it ok to merge this @MaxGamill-Sheffield ?

You can then add the shuffle function and update the test that has been put in place and I can get on with working through putting tests in place for more classes/methods/function.

MaxGamill-Sheffield commented 5 months ago

Sorry for the lateness but all looks good! Will merge and add the shuffle test now :)