AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
60 stars 11 forks source link

[Bug]: Concatenation warning that stops all_statistics.csv from being produced #969

Closed llwiggins closed 1 month ago

llwiggins commented 1 month ago

Checklist

Describe the bug

Both @MaxGamill-Sheffield and I keep running into the same concatenation warning when running topostats process. This warning occurs right at the end of processing and results in no all_statistics.csv being output. It looks as though the issue arises from deprecation of the function that originally concatenated empty or all NA data frames, and the suggested resolution is to exclude these prior to concatenation.

Copy of the output

Traceback (most recent call last):
  File "/Users/laura/miniconda3/envs/topoly/bin/topostats", line 8, in <module>
    sys.exit(entry_point())
             ^^^^^^^^^^^^^
  File "/Users/laura/TopoStats/topostats/entry_point.py", line 386, in entry_point
    args.func(args)
  File "/Users/laura/TopoStats/topostats/run_topostats.py", line 171, in run_topostats
    results = pd.concat(results.values())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 684, in get_result
    new_data = concatenate_managers(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 189, in concatenate_managers
    values = _concatenate_join_units(join_units, copy=copy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 491, in _concatenate_join_units
    warnings.warn(
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Include the configuration file

base_dir: /Volumes/shared/pyne_group/Shared/AFM_Data/Metallodrugs/data # Directory in which to search for data files
output_dir: /Volumes/shared/pyne_group/Shared/AFM_Data/Metallodrugs/output # Directory to output results to
log_level: info # Verbosity of output. Options: warning, error, info, debug
cores: 1 # Number of CPU cores to utilise for processing multiple files simultaneously.
file_ext: .spm # File extension of the data files.
loading:
  channel: Height # Channel to pull data from in the data files.
filter:
  run: true # Options : true, false
  row_alignment_quantile: 0.5 # lower values may improve flattening of larger features
  threshold_method: std_dev # Options : otsu, std_dev, absolute
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: 10.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  threshold_absolute:
    below: -1.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  gaussian_size: 1.0121397464510862 # Gaussian blur intensity in px
  gaussian_mode: nearest
  # Scar remvoal parameters. Be careful with editing these as making the algorithm too sensitive may
  # result in ruining legitimate data.
  remove_scars:
    run: true
    removal_iterations: 2 # Number of times to run scar removal.
    threshold_low: 0.250 # lower values make scar removal more sensitive
    threshold_high: 0.666 # lower values make scar removal more sensitive
    max_scar_width: 4 # Maximum thichness of scars in pixels.
    min_scar_length: 16 # Minimum length of scars in pixels.
grains:
  run: true # Options : true, false
  # Thresholding by height
  threshold_method: std_dev # Options : std_dev, otsu, absolute, unet
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: 10.0 # Threshold for grains below the image background
    above: 1.0 # Threshold for grains above the image background
  threshold_absolute:
    below: -1.0 # Threshold for grains below the image background
    above: 1.0 # Threshold for grains above the image background
  direction: above # Options: above, below, both (defines whether to look for grains above or below thresholds or both)
  # Thresholding by area
  smallest_grain_size_nm2: 50 # Size in nm^2 of tiny grains/blobs (noise) to remove, must be > 0.0
  absolute_area_threshold:
    above: [300, 30000] # above surface [Low, High] in nm^2 (also takes null)
    below: [null, null] # below surface [Low, High] in nm^2 (also takes null)
  remove_edge_intersecting_grains: true # Whether or not to remove grains that touch the image border
  unet_config:
    model_path: null # Path to a trained U-Net model
    grain_crop_padding: 2 # Padding to apply to the grain crop bounding box
    upper_norm_bound: 5.0 # Upper bound for normalisation of input data. This should be slightly higher than the maximum desired / expected height of grains.
    lower_norm_bound: -1.0 # Lower bound for normalisation of input data. This should be slightly lower than the minimum desired / expected height of the background.
grainstats:
  run: true # Options : true, false
  edge_detection_method: binary_erosion # Options: canny, binary erosion. Do not change this unless you are sure of what this will do.
  cropped_size: -1 # Length (in nm) of square cropped images (can take -1 for grain-sized box)
  extract_height_profile: true # Extract height profiles along maximum feret of molecules
disordered_tracing:
  run: true # Options : true, false
  min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
  pad_width: 1 # Pixels to pad grains by when tracing
  mask_smoothing_params:
    gaussian_sigma: 2 # Gaussian smoothing parameter 'sigma' in pixels.
    dilation_iterations: 2 # Number of dilation iterations to use for grain smoothing.
    holearea_min_max: [0, null] # Range (min, max) of a hole area in nm to refil in the smoothed masks.
  skeletonisation_params:
    method: topostats # Options : zhang | lee | thin | topostats
    height_bias: 0.6 # Percentage of lowest pixels to remove each skeletonisation iteration. 1 equates to zhang.
  pruning_params:
    method: topostats # Method to clean branches of the skeleton. Options : topostats
    max_length: 10.0 # Maximum length in nm to remove a branch containing an endpoint.
    height_threshold: # The height to remove branches below.
    method_values: mid # The method to obtain a branch's height for pruning. Options : min | median | mid.
    method_outlier: mean_abs # The method to prune branches based on height. Options : abs | mean_abs | iqr.
nodestats:
  run: true # Options : true, false
  node_joining_length: 7.0 # The distance over which to join nearby crossing points.
  node_extend_dist: 14.0 # The distance over which to join nearby odd-branched nodes.
  branch_pairing_length: 20.0 # The length from the crossing point to pair and trace, obtaining FWHM's.
  pair_odd_branches: false # Whether to try and pair odd-branched nodes. Options: true and false.
  pad_width: 1 # Pixels to pad grains by when tracing (should be the same as disordered_tracing).
ordered_tracing:
  run: true
  ordering_method: nodestats # The method of ordering the disordered traces.
  pad_width: 1 # Pixels to pad grains by when tracing (should be the same as disordered_tracing).
splining:
  run: true # Options : true, false
  method: "rolling_window" # Options : "spline", "rolling_window"
  rolling_window_size: 20.0e-9 # size in nm of the rolling window.
  spline_step_size: 7.0e-9 # The sampling rate of the spline in metres.
  spline_linear_smoothing: 5.0 # The amount of smoothing to apply to linear splines.
  spline_circular_smoothing: 5.0 # The amount of smoothing to apply to circular splines.
  spline_degree: 3 # The polynomial degree of the spline.
#  cores: 1 # Number of cores to use for parallel processing
plotting:
  run: true # Options : true, false
  style: topostats.mplstyle # Options : topostats.mplstyle or path to a matplotlibrc params file
  savefig_format: null # Options : null, png, svg or pdf. tif is also available although no metadata will be saved. (defaults to png) See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  savefig_dpi: 600 # Options : null (defaults to the value in topostats/plotting_dictionary.yaml), see https://afm-spm.github.io/TopoStats/main/configuration.html#further-customisation and https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  pixel_interpolation: null # Options : https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html
  image_set: core # Options : all, core
  zrange: [-2, 5] # low and high height range for core images (can take [null, null]). low <= high
  colorbar: true # Options : true, false
  axes: true # Options : true, false (due to off being a bool when parsed)
  num_ticks: [null, null] # Number of ticks to have along the x and y axes. Options : null (auto) or integer > 1
  cmap: null # Colormap/colourmap to use (default is 'nanoscope' which is used if null, other options are 'afmhot', 'viridis' etc.)
  mask_cmap: blue_purple_green # Options : blu, jet_r and any in matplotlib
  histogram_log_axis: false # Options : true, false
summary_stats:
  run: true # Whether to make summary plots for output data
  config: null

To Reproduce

No response

TopoStats Version

Git main branch

Python Version

3.11

Operating System

MacOS M1/M2 (post-2021)

Python Packages

No response

MaxGamill-Sheffield commented 1 month ago

Just adding that when I found this issue, I had all dnatracing, plotting and summary_stats turned off.

ns-rse commented 1 month ago

Pandas version at the very least would be useful to know as its pd.concat() that does the concatenation.

pip show pandas
llwiggins commented 1 month ago

pandas v2.2.3

MaxGamill-Sheffield commented 1 month ago

Small update where I got with this earlier using Laura's smaller test set.

base = "/Users/Maxgamill/Desktop/Uni/PhD/topo_test/TopoStats/concat/" img1 = "20230526_puc19_tube1_24hr_mg.0_00003" img2 = "20230526_puc19_tube1_24hr_mg.0_00002"

results = defaultdict() for img in [img1, img2]: df = pd.read_csv(base+img+".csv") results[img] = df

total_df = pd.concat(results.values())

- I've made branch `maxgamill-sheffield/969-concat-issue` in which I've attempted to make a few fixes:
  - The `folder_<stats>.csv` was being overwritten by the dis and mol stats so that has been modified to produce all folder stats.
  - The error / failed outputs of the better tracing pipeline now add the columns that should have been added should it have succeeded.
- Thought it might have been because of columns that are present in one but not the other due to failure but alas nope.

Package list:

absl-py 2.1.0 pypi_0 pypi accessible-pygments 0.0.5 pypi_0 pypi afmreader 0.0.1 pypi_0 pypi alabaster 0.7.16 pypi_0 pypi appnope 0.1.4 pypi_0 pypi argparse 1.4.0 pypi_0 pypi astroid 3.1.0 pypi_0 pypi asttokens 2.4.1 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi babel 2.16.0 pypi_0 pypi backcall 0.2.0 pypi_0 pypi beautifulsoup4 4.12.3 pypi_0 pypi biopython 1.84 pypi_0 pypi black 24.4.2 pypi_0 pypi bzip2 1.0.8 h93a5062_5 conda-forge ca-certificates 2024.2.2 hf0a4a13_0 conda-forge certifi 2024.7.4 pypi_0 pypi cfgv 3.4.0 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi cheap-repr 0.5.1 pypi_0 pypi click 8.1.7 pypi_0 pypi cloudpickle 3.0.0 pypi_0 pypi comm 0.2.2 pypi_0 pypi contourpy 1.2.1 pypi_0 pypi coverage 7.5.1 pypi_0 pypi cycler 0.12.1 pypi_0 pypi debugpy 1.8.1 pypi_0 pypi decorator 5.1.1 pypi_0 pypi dill 0.3.8 pypi_0 pypi distlib 0.3.8 pypi_0 pypi docutils 0.20.1 pypi_0 pypi entrypoints 0.4 pypi_0 pypi et-xmlfile 1.1.0 pypi_0 pypi exceptiongroup 1.2.1 pypi_0 pypi execnet 2.1.1 pypi_0 pypi executing 2.0.1 pypi_0 pypi filelock 3.14.0 pypi_0 pypi filetype 1.2.0 pypi_0 pypi flatbuffers 24.3.25 pypi_0 pypi fonttools 4.51.0 pypi_0 pypi gast 0.6.0 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.66.2 pypi_0 pypi h5py 3.11.0 pypi_0 pypi identify 2.5.36 pypi_0 pypi idna 3.8 pypi_0 pypi igor2 0.5.6 pypi_0 pypi imageio 2.34.1 pypi_0 pypi imagesize 1.4.1 pypi_0 pypi iniconfig 2.0.0 pypi_0 pypi ipykernel 6.29.4 pypi_0 pypi ipython 8.24.0 pypi_0 pypi isort 5.13.2 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi joblib 1.4.2 pypi_0 pypi jupyter-client 7.4.9 pypi_0 pypi jupyter-core 5.7.2 pypi_0 pypi keras 3.5.0 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi lazy-loader 0.4 pypi_0 pypi libclang 18.1.1 pypi_0 pypi libffi 3.4.2 h3422bc3_5 conda-forge libsqlite 3.45.3 h091b4b1_0 conda-forge libzlib 1.2.13 h53f4e23_5 conda-forge llvmlite 0.43.0 pypi_0 pypi loguru 0.7.2 pypi_0 pypi markdown 3.7 pypi_0 pypi markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi matplotlib 3.8.4 pypi_0 pypi matplotlib-inline 0.1.7 pypi_0 pypi mccabe 0.7.0 pypi_0 pypi mdit-py-plugins 0.4.1 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi ml-dtypes 0.4.1 pypi_0 pypi mypy-extensions 1.0.0 pypi_0 pypi myst-parser 4.0.0 pypi_0 pypi namex 0.0.8 pypi_0 pypi ncurses 6.4.20240210 h078ce10_0 conda-forge nest-asyncio 1.6.0 pypi_0 pypi networkx 3.3 pypi_0 pypi nodeenv 1.8.0 pypi_0 pypi numba 0.60.0 pypi_0 pypi numpy 1.26.4 pypi_0 pypi numpydoc 1.8.0 pypi_0 pypi numpyencoder 0.3.0 pypi_0 pypi openpyxl 3.1.5 pypi_0 pypi openssl 3.3.0 h0d3ecfb_0 conda-forge opt-einsum 3.4.0 pypi_0 pypi optree 0.12.1 pypi_0 pypi packaging 24.0 pypi_0 pypi pandas 2.2.2 pypi_0 pypi parso 0.8.4 pypi_0 pypi pathspec 0.12.1 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 10.3.0 pypi_0 pypi pip 24.2 pypi_0 pypi platformdirs 4.2.1 pypi_0 pypi pluggy 1.5.0 pypi_0 pypi pockets 0.9.1 pypi_0 pypi pre-commit 3.7.0 pypi_0 pypi prompt-toolkit 3.0.43 pypi_0 pypi protobuf 4.25.5 pypi_0 pypi psutil 5.9.8 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pydata-sphinx-theme 0.15.4 pypi_0 pypi pyfiglet 1.0.2 pypi_0 pypi pygments 2.18.0 pypi_0 pypi pylint 3.1.0 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi pyspm 0.6.1 pypi_0 pypi pytest 7.4.4 pypi_0 pypi pytest-cov 5.0.0 pypi_0 pypi pytest-durations 1.2.0 pypi_0 pypi pytest-github-actions-annotate-failures 0.2.0 pypi_0 pypi pytest-lazy-fixture 0.6.3 pypi_0 pypi pytest-mpl 0.17.0 pypi_0 pypi pytest-regtest 2.1.1 pypi_0 pypi pytest-testmon 2.1.1 pypi_0 pypi pytest-xdist 3.6.1 pypi_0 pypi python 3.10.14 h2469fbe_0_cpython conda-forge python-dateutil 2.9.0.post0 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyupgrade 3.15.2 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi pyzmq 26.0.3 pypi_0 pypi readline 8.2 h92ec313_1 conda-forge requests 2.32.3 pypi_0 pypi rich 13.9.1 pypi_0 pypi ruamel-yaml 0.18.6 pypi_0 pypi ruamel-yaml-clib 0.2.8 pypi_0 pypi schema 0.7.7 pypi_0 pypi scikit-image 0.23.2 pypi_0 pypi scikit-learn 1.4.2 pypi_0 pypi scipy 1.13.0 pypi_0 pypi seaborn 0.13.2 pypi_0 pypi setuptools 69.5.1 pyhd8ed1ab_0 conda-forge six 1.16.0 pypi_0 pypi skan 0.11.1 pypi_0 pypi snakeviz 2.2.0 pypi_0 pypi snoop 0.4.3 pypi_0 pypi snowballstemmer 2.2.0 pypi_0 pypi soupsieve 2.6 pypi_0 pypi sphinx 7.4.7 pypi_0 pypi sphinx-autoapi 3.2.1 pypi_0 pypi sphinx-autodoc-typehints 2.2.3 pypi_0 pypi sphinx-markdown-tables 0.0.17 pypi_0 pypi sphinx-multiversion 0.2.4 pypi_0 pypi sphinx-rtd-theme 2.0.0 pypi_0 pypi sphinxcontrib-applehelp 2.0.0 pypi_0 pypi sphinxcontrib-devhelp 2.0.0 pypi_0 pypi sphinxcontrib-htmlhelp 2.1.0 pypi_0 pypi sphinxcontrib-jquery 4.1 pypi_0 pypi sphinxcontrib-jsmath 1.0.1 pypi_0 pypi sphinxcontrib-mermaid 0.9.2 pypi_0 pypi sphinxcontrib-napoleon 0.7 pypi_0 pypi sphinxcontrib-qthelp 2.0.0 pypi_0 pypi sphinxcontrib-serializinghtml 2.0.0 pypi_0 pypi spyder-kernels 2.3.3 pypi_0 pypi stack-data 0.6.3 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tensorboard 2.17.1 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi tensorflow 2.17.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.37.1 pypi_0 pypi termcolor 2.4.0 pypi_0 pypi threadpoolctl 3.5.0 pypi_0 pypi tifffile 2024.5.3 pypi_0 pypi tk 8.6.13 h5083fa2_1 conda-forge tokenize-rt 5.2.0 pypi_0 pypi tomli 2.0.1 pypi_0 pypi tomlkit 0.12.5 pypi_0 pypi toolz 0.12.1 pypi_0 pypi topofileformats 0.1.0 pypi_0 pypi topoly 1.0.2 pypi_0 pypi topostats 2.2.2.dev896+gcc66a1fa9 pypi_0 pypi tornado 6.4 pypi_0 pypi tqdm 4.66.4 pypi_0 pypi traitlets 5.14.3 pypi_0 pypi typing-extensions 4.11.0 pypi_0 pypi tzdata 2024.1 pypi_0 pypi urllib3 2.2.2 pypi_0 pypi virtualenv 20.26.1 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi werkzeug 3.0.4 pypi_0 pypi wheel 0.43.0 pyhd8ed1ab_1 conda-forge wrapt 1.16.0 pypi_0 pypi wurlitzer 3.1.0 pypi_0 pypi xz 5.2.6 h57fd34a_0 conda-forge

ns-rse commented 1 month ago

Re-opening as #973 is still open.

ns-rse commented 1 month ago

Closed by #973