AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
57 stars 10 forks source link

Take .npy arrays as inputs for reprocessing without filters step. #547

Open derollins opened 1 year ago

derollins commented 1 year ago

Is your feature request related to a problem? Please describe.

The filtering, flattening, scar removal stage of spm image processing is by far the longest although the hardest parameters to get right are the grain masking values which often require significant trial and error to optimise. If these two processes were separated and the .npy files generated after the flattening is done can be fed back into the topostats pipeline at the grain masking stage (or the plotting stage) then this would allow much quicker optimisation once the flattening has been successful.

Describe the solution you'd like

I would be like to use the .npy files generated by topostatats as an input at the grain or plotting level.

Describe alternatives you've considered

I've made a script that reprocessed the .npy files and generates images. See https://github.com/AFM-SPM/TopoStats/tree/replot-from-npy replotting.py This is an alternative to the plotting part of the topostats pipeline but doesn't allow reprocessing of the grains.

ns-rse commented 1 year ago

Not currently possible but this links into the proposed restructuring of how TopoStats is run using the "Swiss Army Knife" approach I've suggested in #517 which takes a more modular approach to the different processing stages.

ns-rse commented 1 year ago

Great to see you embracing Python @derollins :+1:

Some tips on your script...

You don't need to copy functions like find_files() or load_scans() into your replotting.py script, you can use the import the functions from TopoStats...

from topostats.io import find_files, load_scans

You even import the load_scans function already, so there is no point in redefining it. This would negate the need to from typing import ... as these are typehint functionality only used when adding Typehints to variable and function arguments/return types.

You use the built-in library os to get the filename to save as the [pathlib]() library is much easier to use (see Why you should be using pathlib - Trey Hunner and No really, pathlib is great - Trey Hunner).

You can import the plotting_config which is a nested structure in topostats/default_config.yaml installed as part of TopoStats using...

import importlib.resources as pkg_resources

# Read the 'default_config.yaml' from the install of TopoStats to the dictionary 'config'
default_config = pkg_resources.open_text(__package__, "default_config.yaml")
config = yaml.safe_load(default_config.read())
# Subset just nested dictionary stored with the key 'plotting_config' to its own dictionary
plotting_config = config["plotting_config"]

For sharing scripts like this you may find using GitHub Gists more appropriate rather than making a branch of a whole repository.