Refactoring and tests of dnatracing.py

ns-rse commented 2 years ago

topostats.tracing.dnatracing is the module that covers the processing after "topotracing" has processed an image via Filters(), Grains() and GrainStats(). It contains a couple of classes dnaTrace() and traceStats() and some work has been done on these to align them with the new workflow (see #166).

But there are no tests in place and the code can be refactored to be cleaner which in turn will allow for finder-grained tests to be written. Further docstrings need writing for all modules, classes and methods, there is some information there but not fully qualified numpy docstrings.

Many of the methods have for: loops in them, each of the contents of these loops should be abstracted out into its own _[method] and that method then called from within a loop of the associated [method]. This makes it easier to write tests on a single case.

There is a lot of old code left in various comments or documentation blocks that will need cleaning out.

[x] #27
[x] #289
[ ] #290
[x] #294
[ ] #295 (blocks #294 though so needs completing first)
[x] #402
[x] #403
[x] #595

ns-rse commented 2 years ago

One thing to consider is replacing the skeletonize solution that dnatracing calls with the same functionality from scikit-image skeletonize.

alicepyne commented 2 years ago

Definitely agree with this. Makes it more flexible too. Unless @joe thinks differently? It’s his initial code!

On Wed, 20 Jul 2022 at 09:54, Neil Shephard @.***> wrote:

One thing to consider is replacing the skeletonize solution that dnatracing calls with the same functionality from scikit-image skeletonize https://scikit-image.org/docs/stable/auto_examples/edges/plot_skeleton.html .

— Reply to this email directly, view it on GitHub https://github.com/AFM-SPM/TopoStats/issues/183#issuecomment-1190008856, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJSFMLD4OQG2NKZUVZBQITVU65DJANCNFSM5ZB4F7TA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dr Alice Pyne UKRI Future Leaders Fellow & Senior Lecturer Department of Materials Science and Engineering University of Sheffield

Email: @.*** Website: www.pyne-lab.uk

ns-rse commented 2 years ago

This is quite a big refactor and so I'm going to break it down into smaller tasks, and so will use this issue as an "Epic" and break it down into smaller tasks and list the issues below.

[x] #27
[x] #289
[ ] #290
[ ] #294
[ ] #295 (blocks #294 though so needs completing first)
[ ] #403

ns-rse commented 1 year ago

I've started work on this and am aiming to simplify a lot of the code by making the class process a single grain at a time, that removes all the complexity that is scattered around that repeats each step on multiple grains.

Processing multiple grains is then simplified by iterating over a dictionary of grains which itself may contain several different sets of information such as the grain in its raw processed form (post Filters), the mask of the grain (as skeletonisation currently works on binary arrays), the original coordinates should there be a desire to reconstruct the whole image with skeletons for example.

ns-rse commented 1 year ago

Belatedly adding notes from meeting 2023-01-17

Make binary dilation optional, this sometimes results in grains touching the border and when this happens skeletonisation doesn't always work correctly (unclear what method though whether this is scikit-image or "Joe's". A possible solution to deal with this might be to np.pad() the dilated arrays (masked or otherwise) so that there are blanks/zero's and dilated images don't touch the edges.
Rather than ~purge_obvious_crap~ we need to have the option to remove objects that after skeletonisation are smaller than a minimum size. Despite removing small objects during filtering and grain detection we can still end up with blobs (roughly circular artefacts) that when skeletonised reduce down to very small skeletons.

As a dummy example...

[
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
 [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
 [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
 [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
 [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
 [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
 [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
 [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

...might skeletonise to...

[
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

Strictly it has skeletonised, but its neither circular nor linear and should be removed.

ns-rse commented 1 year ago

Add a function to topostats.io to write co-ordinates of traces to CSV.

Documented in #595 and added to the todo list.

ns-rse commented 1 year ago

After #610 has been completed the next step is to spline the traces, current code-base only performs this for linear molecules, be sure to git rebase and include the changes in #653.

ns-rse commented 1 month ago

Closing as work has been undertaken under the EPIC #800 and related issues linked from there.

AFM-SPM / TopoStats

Refactoring and tests of dnatracing.py #183