Closed ns-rse closed 5 months ago
Adding comment from commit here for greater prominence so I don't forget what has been disabled.
See log
Also threw a bunch of other things which need addressing, some have been disabled for now others addressed.
atst
directory of output..svg
to .gitignore
.topostats/plottingfuncs.py
- Numpydoc validation.topostats/plottingfuncs.py
- Typehints np.[nd]array
> npt.NDArray
topostats/processing.py
- Numpydoc validation.topostats/utils.py
- restore missing padding
argument to checks()
calls.topostats/tracing/dnatracing.py
- exclude from codespell
and numpydoc-validation
check for now.topostats/tracing/nodestats.py
- exclude from codespell
and numpydoc-validation
check for now.topostats/theme.py
- exclude from numpydoc-validation
check for now.In regards to the unusual skeleton test, it looks a bit weird but technically it is not a problem for our work.
Due to many pixels at the same low value (1), not all of these are removed on the first pass of the skeletonisation algorithm. Then on the second pass there is still enough 1's to fully remove them using the height bias. As when using np.argsort
the order of groups of 1's doesn't change, meaning the "last" 1-pixel will only be removed after N passes. Sometimes this means that pixel can’t be removed due to not fitting the skeletonisation removal conditions.
See the following images which isolates the bottom right corner pixel and the associated output statement of: sub-iteration, # total points to remove, # hight bias points to remove, height values and counts.
A fix then is to shuffle the order of each unique value in the array to remove so that the lowest hight pixel may not always lie outside the bias bounds. This gives the following 4 results.
Although this is not an issue for our data (pixels unlikely to have the same value) do you recommend adding the function? and if so how shall I add it? to my working branch and you rebase from it?
@ns-rse Reply:
Is there a better example of heights that can be used to test this then?
I think the test is good and the added shuffle function should help
It seems like a regression/step-backwards compared to the Zhang method if its leaving a branch and why just on that corner as the shape is symmetrical.
Branch is not ideal and is an artifact of the algorithm. It's only on that side because that index in the array is the largest for the 1's at [18, 18].
I'd be wary of shuffling order from scanning through the code (not gone it in detail yet as I've focused just on the new class to do the skeletonisation, things are ordered in various places. I'd want to investigate the wider impact of doing so (hence why we need tests!).
I think the test has captured the need for this shuffle function which is ace and I don't see it's affect creeping up much elsewhere other than possibly producing different structures based on the shuffle - maybe we set a seed for this?
@MaxGamill-Sheffield the problem was that I hadn't updated the calls in dnatracing.py
and nodestats.py
to work with the revised getSkeleton()
which in this PR removed the use of a dictionary as an argument (see above comments about these raising linting errors). I'd also made a minor mistake in addressing another linting issue and removed a command added a space.
Process minicircle.spm
ok locally.
Noticed also that the default_config.yaml
and been altered to use atst/
directory for input and output will have to remove that before things go into main
.
This PR puts tests in place (and lints some of the touched files), its not meant to be a branch for development of features or fixes for errors that the tests highlight.
I'm not convinced that the topostats
method isn't a regression given the way it produces s spurious branch and have concerns that this may cause problems further down the line but further unit testing of these methods rather than the broad testing of the class which is implemented here might help resolve this and @MaxGamill-Sheffield has thoughts on how to address this anyway (which if implemented on maxgamill-sheffield/800-better-tracing
branch after merging can also update the relevant test).
Is it ok to merge this @MaxGamill-Sheffield ?
You can then add the shuffle function and update the test that has been put in place and I can get on with working through putting tests in place for more classes/methods/function.
Sorry for the lateness but all looks good! Will merge and add the shuffle test now :)
First step in adding tests to the
maxgamill-Sheffield/800-better-tracing
.Tests
get_skeleton()
function to work withgetSkeleton()
.tests/conftest.py
totests/tracing/conftest.py
."topostats"
method with different values ofheight_bias
("smoke" means its testing whether a bunch of things pass or fail, in essence a regression tests and not the fine-grained unit tests that should be the basis of test suites).getSkeleton
class/method, these are flagged as w0102dangerous-default-value
by Pylint and B006mutable-argument-default
by Ruff.IMPORTANT
I noticed whilst developing the tests that the tests on a circular molecule with
height_bias = 0.6
results in a strange skeleton. The ring of the circle is correct but there is a branch in the bottom right (searchtests/tracing/test_skeletonize.py
forTopoStats, circular, height_bias 0.6
and the array that is produced is above this).This probably isn't what is expected but is worth giving some consideration to now because if such artifacts are produced on simple test cases its quite possible that we will get unwanted artifacts in more complex situations.
Additional Linting
Because I modified
topostats/tracing/skeletonize.py
it was linted as part of thepre-commit
checks which includeruff
,pylint
and nownumpydoc-validation
. I have therefore linted this file and...???
.topostatsSkeletonize
/pruneSkeleton
/topostatsPrune
/convPrune
/heightPruning
)Some of the disabled checks (line numbers are rough but should get you close)
Ruff
Pylint
Some hopefully useful comments on how to resolve some of these...
topostatsSkeletonize
class hastoo-many-instance-attributes
this could be solved by convertingself.p#
to be a dictionary calledself.points
which would then have keys/values for each of the points.dangerous-default-value
) instead should use kwargs. From memory these can carry through functions without being unpacked, so the function/method that is called has `kwargsdefined and that is passed on as
**kwargs` to the setting function right down to the final method that will use the values.too-many-locals
it generally indicates that functionality should be broken out into smaller functions.More importantly its is considerably easier to address these during development/writing of code and not at the end in a big "clean-up". We have
pre-commit
hooks in place and should use them even if it is tempting to bypass them.