tests: Adapts get_skeleton() tests to the getSkeleton() class

First step in adding tests to the maxgamill-Sheffield/800-better-tracing.

Tests

Adapts tests for get_skeleton() function to work with getSkeleton().
Moves fixture definitions for these tests from tests/conftest.py to tests/tracing/conftest.py.
Parametrises the tests.
Includes "smoke" tests for using the "topostats" method with different values of height_bias ("smoke" means its testing whether a bunch of things pass or fail, in essence a regression tests and not the fine-grained unit tests that should be the basis of test suites).
Removed the dictionary argument to the getSkeleton class/method, these are flagged as w0102 dangerous-default-value by Pylint and B006 mutable-argument-default by Ruff.

IMPORTANT

I noticed whilst developing the tests that the tests on a circular molecule with height_bias = 0.6 results in a strange skeleton. The ring of the circle is correct but there is a branch in the bottom right (search tests/tracing/test_skeletonize.py for TopoStats, circular, height_bias 0.6 and the array that is produced is above this).

This probably isn't what is expected but is worth giving some consideration to now because if such artifacts are produced on simple test cases its quite possible that we will get unwanted artifacts in more complex situations.

Additional Linting

Because I modified topostats/tracing/skeletonize.py it was linted as part of the pre-commit checks which include ruff, pylint and now numpydoc-validation. I have therefore linted this file and...

Corrected all docstrings to pass validation. There were some that took ages to find and rectify. In the end I had to rewrite the docstrings from scratch for about 20 classes/methods as there was a hidden character that caused things to fail, quite possibly a hangover from when files might have been edited under Windows.
I've left notes where I couldn't understand what functions/methods were doing, these can be found by searching for ???.
Disabled a number of checks for now as most were in the other classes I've not addressed yet (i.e. topostatsSkeletonize / pruneSkeleton / topostatsPrune / convPrune / heightPruning)

Some of the disabled checks (line numbers are rough but should get you close)

Ruff

topostats/tracing/skeletonize.py:597:34: B006 Do not use mutable data structures for argument defaults
topostats/tracing/skeletonize.py:775:9: C901 `_prune_by_length` is too complex (11 > 10)

Pylint

topostats/tracing/skeletonize.py:1:0: C0302: Too many lines in module (1399/1000) (too-many-lines)
topostats/tracing/skeletonize.py:17:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:187:0: R0902: Too many instance attributes (13/7) (too-many-instance-attributes)
topostats/tracing/skeletonize.py:595:4: W0102: Dangerous default value {} as argument (dangerous-default-value)
topostats/tracing/skeletonize.py:567:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:768:36: E1121: Too many positional arguments for method call (too-many-function-args)
topostats/tracing/skeletonize.py:773:4: R0914: Too many local variables (18/15) (too-many-locals)
topostats/tracing/skeletonize.py:685:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:958:36: E1121: Too many positional arguments for method call (too-many-function-args)
topostats/tracing/skeletonize.py:875:0: R0903: Too few public methods (1/2) (too-few-public-methods)
topostats/tracing/skeletonize.py:1179:4: R0914: Too many local variables (16/15) (too-many-locals)
topostats/tracing/skeletonize.py:1335:0: R0914: Too many local variables (19/15) (too-many-locals)

Some hopefully useful comments on how to resolve some of these...

topostatsSkeletonize class has too-many-instance-attributes this could be solved by converting self.p# to be a dictionary called self.points which would then have keys/values for each of the points.
Shouldn't pass dictionaries (dangerous-default-value) instead should use kwargs. From memory these can carry through functions without being unpacked, so the function/method that is called has `kwargsdefined and that is passed on as**kwargs` to the setting function right down to the final method that will use the values.
When PyLint reports too-many-locals it generally indicates that functionality should be broken out into smaller functions.

More importantly its is considerably easier to address these during development/writing of code and not at the end in a big "clean-up". We have pre-commit hooks in place and should use them even if it is tempting to bypass them.

Adding comment from commit here for greater prominence so I don't forget what has been disabled.

See log

Correct commenting of unused regions.
Removal of unnecessary list comprehension.

Also threw a bunch of other things which need addressing, some have been disabled for now others addressed.

[x] Remove atst directory of output.
[x] Add .svg to .gitignore.
[x] topostats/plottingfuncs.py - Numpydoc validation.
[x] topostats/plottingfuncs.py - Typehints np.[nd]array > npt.NDArray
[x] topostats/processing.py - Numpydoc validation.
[x] topostats/utils.py - restore missing padding argument to checks() calls.
[ ] topostats/tracing/dnatracing.py - exclude from codespell and numpydoc-validation check for now.
[ ] topostats/tracing/nodestats.py - exclude from codespell and numpydoc-validation check for now.
[ ] topostats/theme.py - exclude from numpydoc-validation check for now.

In regards to the unusual skeleton test, it looks a bit weird but technically it is not a problem for our work.

Due to many pixels at the same low value (1), not all of these are removed on the first pass of the skeletonisation algorithm. Then on the second pass there is still enough 1's to fully remove them using the height bias. As when using np.argsort the order of groups of 1's doesn't change, meaning the "last" 1-pixel will only be removed after N passes. Sometimes this means that pixel can’t be removed due to not fitting the skeletonisation removal conditions.

See the following images which isolates the bottom right corner pixel and the associated output statement of: sub-iteration, # total points to remove, # hight bias points to remove, height values and counts. 0-mask 1-a 1-b Screenshot 2024-04-04 at 23 09 50

A fix then is to shuffle the order of each unique value in the array to remove so that the lowest hight pixel may not always lie outside the bias bounds. This gives the following 4 results. 0-mask (1) 1-a (1) 1-b (1) Screenshot 2024-04-04 at 23 14 57

Although this is not an issue for our data (pixels unlikely to have the same value) do you recommend adding the function? and if so how shall I add it? to my working branch and you rebase from it?

@ns-rse Reply:

Is there a better example of heights that can be used to test this then?

I think the test is good and the added shuffle function should help

It seems like a regression/step-backwards compared to the Zhang method if its leaving a branch and why just on that corner as the shape is symmetrical.

Branch is not ideal and is an artifact of the algorithm. It's only on that side because that index in the array is the largest for the 1's at [18, 18].

I'd be wary of shuffling order from scanning through the code (not gone it in detail yet as I've focused just on the new class to do the skeletonisation, things are ordered in various places. I'd want to investigate the wider impact of doing so (hence why we need tests!).

I think the test has captured the need for this shuffle function which is ace and I don't see it's affect creeping up much elsewhere other than possibly producing different structures based on the shuffle - maybe we set a seed for this?

@MaxGamill-Sheffield the problem was that I hadn't updated the calls in dnatracing.py and nodestats.py to work with the revised getSkeleton() which in this PR removed the use of a dictionary as an argument (see above comments about these raising linting errors). I'd also made a minor mistake in addressing another linting issue and removed a command added a space.

Process minicircle.spm ok locally.

Noticed also that the default_config.yaml and been altered to use atst/ directory for input and output will have to remove that before things go into main.

This PR puts tests in place (and lints some of the touched files), its not meant to be a branch for development of features or fixes for errors that the tests highlight.

I'm not convinced that the topostats method isn't a regression given the way it produces s spurious branch and have concerns that this may cause problems further down the line but further unit testing of these methods rather than the broad testing of the class which is implemented here might help resolve this and @MaxGamill-Sheffield has thoughts on how to address this anyway (which if implemented on maxgamill-sheffield/800-better-tracing branch after merging can also update the relevant test).

Is it ok to merge this @MaxGamill-Sheffield ?

You can then add the shuffle function and update the test that has been put in place and I can get on with working through putting tests in place for more classes/methods/function.

Sorry for the lateness but all looks good! Will merge and add the shuffle test now :)

AFM-SPM / TopoStats