AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
60 stars 11 forks source link

Performance regression of Topoly with Python 3.12 #927

Open ns-rse opened 1 month ago

ns-rse commented 1 month ago

After merging #897 I checked to see what fails on maxgamill-sheffield/topology as I'm working my way through #850 and this branch is due to be merged to maxgamill-sheffield/800-better-tracing.

I found though that whether I ran the tests on a single thread or multiple the test suite never actually starts, it sits there using 100% CPU of the number of cores specified and I have to manually kill it.

git bisect

To investigate this I've employed a git bisect (using Magit's bisect) running pytest -x on each bisection.

Bisect Log

Bisect Log (8)
git bisect start 'HEAD' '60c9d1636'
263f07cf bad Merge pull request #897 from AFM-SPM/SylviaWhittle/topology_tests
60c9d163 good re-add nodestats to config.md

git bisect bad c15f390d453eeedc4249bf9e17ab29255ff7afe6
c15f390d bad Fix pair_odd_branches not present in class constructor for test_connect_nodes_nearest

git bisect bad a34e5135a222de39d585deec89c074f21ffed9b8
a34e5135 bad removes double dilation away from nodes so unpaired branches finish away from the crossing branches for tracing

git bisect bad 5f279e1d279859ad1452f599c69c681f056a8ed1
5f279e1d bad got writhe calc working - i think

git bisect bad 4b7d25813e48073099d9b9e9dcd7b02255f13a19
4b7d2581 bad ordered traces adds topology & outputs molstats & splining takes molstats

git bisect good 435f92832708ae322d7d076c0b0022c327aea44d
435f9283 good tidy up new pruning method

git bisect good 560d0b19d86f38969277002af32cca08920c0746
560d0b19 good dis trace pruning re-structured

4b7d25813e48073099d9b9e9dcd7b02255f13a19 is the first bad commit

Bad Commit

This identified the bad commit as...

4b7d25813e48073099d9b9e9dcd7b02255f13a19 is the first bad commit
commit 4b7d25813e48073099d9b9e9dcd7b02255f13a19
Author: Max Gamill <mcgamill1@sheffield.ac.uk>
Date:   2024-09-03 17:20:05 +0100
    ordered traces adds topology & outputs molstats & splining takes molstats
 topostats/processing.py              | 133 +++++++++--------
 topostats/tracing/ordered_tracing.py | 273 +++++++++++++++++++++++++++--------
 topostats/tracing/splining.py        |  10 +-
 3 files changed, 290 insertions(+), 126 deletions(-)

Of the three files that are changed in this commit there are, at that point, no tests for...

...but there is a test for topostats/processing.py and running that in isolation also causes the tests to hang.

Trying to run just a single test within this file (tests/test_processing.py::test_run_filters) took about just over seven minutes so there is something weird going on as that runs in a few seconds on main.


tests/test_processing.py::test_run_filters 
---------------------------------------------------------- live log setup -----------------------------------------------------------
INFO     topostats:io.py:967 Extracting image from /home/neil/work/git/hub/AFM-SPM/TopoStats/tests/resources/test_image/minicircle_small.topostats
INFO     topostats:io.py:596 Loading image from : /home/neil/work/git/hub/AFM-SPM/TopoStats/tests/resources/test_image/minicircle_small.topostats
INFO     topostats:io.py:1103 Loading hdf5 key: image
INFO     topostats:io.py:1103 Loading hdf5 key: pixel_to_nm_scaling
INFO     topostats:io.py:1103 Loading hdf5 key: topostats_file_version
INFO     topostats:io.py:604 TopoStats file version: 0.1
INFO     topostats:io.py:1009 [minicircle_small] Image added to processing.
----------------------------------------------------------- live log call -----------------------------------------------------------
INFO     topostats:processing.py:78 [dummy filename] Image dimensions: (64, 64)
INFO     topostats:processing.py:79 [dummy filename] : *** Filtering ***
INFO     topostats:filters.py:178 [dummy filename] : Median flattening without mask
INFO     topostats:filters.py:218 [dummy filename] : Plane tilt removal without mask
INFO     topostats:filters.py:229 [dummy filename] : x-polyfit 1st order: [ 0.01287956 -0.43328492]
INFO     topostats:filters.py:231 [dummy filename] : y-polyfit 1st order: [ 1.67143466e-17 -6.09768646e-16]
INFO     topostats:filters.py:235 [dummy filename] : Removing x plane tilt
INFO     topostats:filters.py:246 [dummy filename] : removing y plane tilt
INFO     topostats:filters.py:383 [dummy filename] : Remove quadratic bow without mask
INFO     topostats:filters.py:390 [dummy filename] : x polyfit 2nd order: [-8.78008084e-05  5.53145093e-03 -4.90443244e-01]
INFO     topostats:filters.py:347 [dummy filename] : Nonlinear polynomial removal optimal params: const: -0.13058404983843036 xy: 2.406788411046036e-05 x: 0.0016721081736799663 y: 0.0011862968183509776
INFO     topostats:filters.py:523 [dummy filename] : Initial scar removal
INFO     topostats:scars.py:392 [dummy filename] : Removing scars
INFO     topostats:filters.py:461 [dummy filename] : Zero averaging background : 0.0005879895337349665 nm
INFO     topostats:utils.py:169 [dummy filename] : Masking (above) Threshold: 0.7200521040448227
INFO     topostats:utils.py:171 [dummy filename] : Masking (below) Threshold: -7.200521040448226
INFO     topostats:filters.py:175 [dummy filename] : Median flattening with mask
INFO     topostats:filters.py:215 [dummy filename] : Plane tilt removal with mask
INFO     topostats:filters.py:229 [dummy filename] : x-polyfit 1st order: [ 0.00011906 -0.00039111]
INFO     topostats:filters.py:231 [dummy filename] : y-polyfit 1st order: [-1.65211760e-20  8.32667268e-18]
INFO     topostats:filters.py:235 [dummy filename] : Removing x plane tilt
INFO     topostats:filters.py:246 [dummy filename] : removing y plane tilt
INFO     topostats:filters.py:380 [dummy filename] : Remove quadratic bow with mask
INFO     topostats:filters.py:390 [dummy filename] : x polyfit 2nd order: [-1.17717765e-06  7.41621917e-05 -1.15745174e-03]
INFO     topostats:filters.py:347 [dummy filename] : Nonlinear polynomial removal optimal params: const: 0.03518261704381109 xy: -2.8397760380105552e-05 x: -0.0009275597111546574 y: -3.7884590041947774e-05
INFO     topostats:filters.py:568 [dummy filename] : Secondary scar removal
INFO     topostats:scars.py:392 [dummy filename] : Removing scars
INFO     topostats:filters.py:461 [dummy filename] : Zero averaging background : -0.0014525606584533095 nm
INFO     topostats:filters.py:480 [dummy filename] : Applying Gaussian filter (mode : nearest; Gaussian blur (px) : 1.0121397464510862).
INFO     topostats:processing.py:90 [dummy filename] : Plotting Filtering Images
INFO     topostats:plottingfuncs.py:313 [dummy filename] : Image saved to : /tmp/pytest-of-neil/pytest-55/test_run_filters0/dummy filename.png | DPI: 100
PASSED                                                                                                                        [100%]

------------------------------------------------------- pytest-regtest report -------------------------------------------------------
total number of failed regression tests: 0
======================================================= fixture duration top ========================================================
total          name                num avg            min           
0:00:00.067953 process_scan_config   1 0:00:00.067953 0:00:00.067953
0:00:00.072484         grand total   4 0:00:00.002171 0:00:00.000190
====================================================== test call duration top =======================================================
total          name                num avg            min           
0:00:00.542028    test_run_filters   1 0:00:00.542028 0:00:00.542028
0:00:00.542028         grand total   1 0:00:00.542028 0:00:00.542028
====================================================== test setup duration top ======================================================
total          name                num avg            min           
0:00:00.073169    test_run_filters   1 0:00:00.073169 0:00:00.073169
0:00:00.073169         grand total   1 0:00:00.073169 0:00:00.073169
==================================================== test teardown duration top =====================================================
total          name                num avg            min           
0:00:00.000297         grand total   1 0:00:00.000297 0:00:00.000297

---------- coverage: platform linux, python 3.12.6-final-0 -----------
Name                                      Stmts   Miss  Cover
-------------------------------------------------------------
topostats/__main__.py                         2      2     0%
topostats/entry_point.py                    106    106     0%
topostats/filters.py                        155     15    90%
topostats/grains.py                         106     81    24%
topostats/grainstats.py                     332    281    15%
topostats/io.py                             447    328    27%
topostats/logs/logs.py                       25      0   100%
topostats/measure/geometry.py                85     74    13%
topostats/plotting.py                       182    150    18%
topostats/plottingfuncs.py                  137     49    64%
topostats/processing.py                     324    270    17%
topostats/run_topostats.py                  137    137     0%
topostats/scars.py                          106      1    99%
topostats/statistics.py                      26     20    23%
topostats/theme.py                           46      5    89%
topostats/thresholds.py                      31     12    61%
topostats/tracing/disordered_tracing.py     146    116    21%
topostats/tracing/dnacurvature.py            24     24     0%
topostats/tracing/dnatracing.py             480    480     0%
topostats/tracing/nodestats.py              667    561    16%
topostats/tracing/ordered_tracing.py        315    277    12%
topostats/tracing/pruning.py                225    183    19%
topostats/tracing/skeletonize.py            136     95    30%
topostats/tracing/splining.py               173    148    14%
topostats/tracing/tracingfuncs.py           276    243    12%
topostats/utils.py                          116     72    38%
topostats/validation.py                      15     15     0%
-------------------------------------------------------------
TOTAL                                      4820   3745    22%

=================================================== 1 passed in 428.73s (0:07:08) ===================================================

Strategy

For each file carefully review the changes...

git diff 4b7d25813e48073099d9b9e9dcd7b02255f13a19 560d0b19d86f38969277002af32cca08920c074 -- <path/to/file.py>

Try and identify what has caused the problem, checking by running pytest tests/test_processing.py. Might be possible to revert some hunks individually to see if they are the cause but never done this.

topostats/processing.py

Majority of the changes here are results_df > grainstats_df which shouldn't have any impact but there are some other dif

ns-rse commented 1 month ago

I've been doing some profiling and have tracked down the cause of this hang to be a module from Topoly.

The attached profile was obtained by running...

git switch 4b7d25813
python -m cProfile -o prof/test_run_filters_cProfile_hangs.prof $(which pytest) tests/test_processing.py::test_run_filters

It can then be visualised with the following (you don't have to run the profiling yourself you can just download the attached file and run the following after installing snakeviz...

# Optionally install snakeviz
pip install snakeviz
# Uncompress the file
gunzip test_run_filters_cProfile_hangs.prof.gz
# Run snakeviz on the file
snakeviz path/to/test_run_filters_cProfile_hangs.prof

A browser tab will open and at the top it shows you that the module where most time is spent is polyvalues.py. Only a single call is made to this module... image

The second box from the left (may differ on your view) with no "icicles" is the polyvalues.py module and it doesn't actually call or run any processes itself, it just takes a long time to do....something?!?!?!

image

test_run_filters_cProfile_hangs.prof.gz

If I repeat this on the previous commit (git switch 560d0b19d and repeat the profiling) there is no such problem, no call is made to polyvalues.py.

To check I've compared the two commits shows this appears to be the first point at which topoly modules are imported, we could use -G topoly to search the differences when checked out on the commit that fails and the previous commit but this exceeds DFT_GRAPH_LIMIT so I have gone with pipping it into grep (NB this is the output from difftastic and shows a line has been added to HEAD (at line 11) that isn't in the previous commit (HEAD~1), I limit the output to the first 20 lines by piping to head -n 20)...

 ❱ git diff HEAD HEAD~1 | grep topoly -A3 -B3

 ❱ git diff HEAD HEAD~1 | grep topoly -A3 -B3
555- 8  8 import numpy.typing as npt
556- 9  9 import pandas as pd
557-10 10 from skimage.morphology import label
558:11 .. from topoly import jones, translate_code
559-12 11 
560-13 12 from topostats.logs.logs import LOGGER_NAME
561-14 13 from topostats.tracing.tracingfuncs import coord_dist, genTracingFuncs, order_branch, reorderTrace
--
847-349             if len(nxyz) > 2 and end_to_end_dist_square ... 
848-... d <= 2:  # pylint: disable=chained-comparison           ... 
849-350                 # single coord traces mean nxyz[0]==[1] ... 
850:...  so cause issues when duplicating for topoly            ... 
851-351                 nxyz = np.append(nxyz, nxyz[0][np.newax ... 
852-... is, :], axis=0)                                         ... 
853-352             simple_coords.append(nxyz)                  ... 

I'm not sure why this is even being called though as I though topoly has only been introduced to undertake work in DNA tracing and shouldn't affect the very early stage of filters? Certainly none of the files in topostats/filters.py have been touched...

git diff --name-only HEAD HEAD~1
topostats/processing.py
topostats/tracing/ordered_tracing.py
topostats/tracing/splining.py

Although perhaps my understanding is wrong and everything is getting initalised for some reason. :thinking:

@MaxGamill-Sheffield @SylviaWhittle @llwiggins : do you observe this long hang when running the tests? Do you get similar profiling?

MaxGamill-Sheffield commented 1 month ago

Right so I've spent all morning and a bit of the afternoon on this and here's what I found / failed:

  1. On my ARM architecture, pip cannot find any topoly versions that satisfy Python 3.12 - (from: none) - despite there being more versions on PyPi (https://pypi.org/project/topoly/)
  2. On ARM Python 3.11, there are only versions upto 1.0.2:
    (ts311) Max:TopoStats Maxgamill$ pip install topoly==1.0.4 --no-cache
    ERROR: Could not find a version that satisfies the requirement topoly==1.0.4 (from versions: 1.0.1, 1.0.2)
    ERROR: No matching distribution found for topoly==1.0.4
  3. On windows x86 systems, pip install TopoStats again only installed topoly 1.0.2 and couldn't find 1.0.4 either.

All in all I cannot reproduce the hang on the lab windows computer (x86), my computer (ARM - M1), Tom's machine (ARM - M2), Tobi's machine (ARM), Laura's machine (ARM).

Did the 1.0.4 release only apply to Unix systems? And can you @ns-rse please try with the downgraded 1.0.2 topoly version as I cannot seem to upgrade?

I did also find their website again which says that 1.0.3 should work for ARM, but we see that 1.0.2 works: https://topoly.cent.uw.edu.pl/installation.html

ns-rse commented 1 month ago

Yep, testing a downgraded version is on my ToDo for Monday.

The versions available on PyPI for 1.0.4 are listed here which is why you couldn't install it on your laptop. Should be available for Windows according to that page though.

Release History can be browsed to find that the last OSX ARM release was indeed 1.0.2.

Will report back Monday.

ns-rse commented 1 month ago

I've found the root cause of the problem and its related to Python versions not topoly or pytest versions.

 ❱ python --version
Python 3.11.9
 11:13:50 am BST  percentage:  neil  ~/work/git/hub/AFM-SPM/TopoStats  G  maxgamill-sheffield/topology  @4  ?  
(topostats-311) ❱ pip show pytest
pip sName: pytest
Version: 8.3.3
Summary: pytest: simple powerful testing with Python
Home-page: 
Author: Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, Florian Bruhin, Others (See AUTHORS)
Author-email: 
License: MIT
Location: /home/neil/.virtualenvs/topostats-311/lib/python3.11/site-packages
Requires: iniconfig, packaging, pluggy
Required-by: pytest-cov, pytest-durations, pytest-github-actions-annotate-failures, pytest-icdiff, pytest-mock, pytest-mpl, pytest-pylint, pytest-regtest, pytest-testmon, pytest-xdist, syrupy
 11:13:57 am BST  percentage:  neil  ~/work/git/hub/AFM-SPM/TopoStats  G  maxgamill-sheffield/topology  @4  ?  
(topostats-311) ❱ pip show topoly
Name: topoly
Version: 1.0.4
Summary: This package provides a set of tools for modeling protein knots, lassos and other.
Home-page: https://topoly.cent.uw.edu.pl/
Author: INTERDISCIPLINARY LABORATORY of BIOLOGICAL SYSTEMS MODELLING, University of Warsaw, Warsaw, Poland
Author-email: jsulkowska@cent.uw.edu.pl
License: 
Location: /home/neil/.virtualenvs/topostats-311/lib/python3.11/site-packages
Requires: argparse, biopython, matplotlib, numpy, Pillow, scipy
Required-by: topostats

The one test I've been working with now runs in the expected and reasonable amount of time....

(topostats-311) ❱ python -m cProfile -o prof/test_run_filters_cProfile_topoly-1.0.4-python-3.11.prog $(which pytest) tests/test_processing.py::test_run_filters
===================================================================== test session starts =====================================================================
platform linux -- Python 3.11.9, pytest-8.3.3, pluggy-1.5.0
Matplotlib: 3.9.2
Freetype: 2.6.1
rootdir: /home/neil/work/git/hub/AFM-SPM/TopoStats
configfile: pyproject.toml
plugins: durations-1.3.1, pylint-0.21.0, regtest-2.2.0, github-actions-annotate-failures-0.2.0, syrupy-4.7.1, icdiff-0.9, cov-5.0.0, mock-3.14.0, testmon-2.1.1, xdist-3.6.1, anyio-4.6.0, mpl-0.17.0
collected 1 item                                                                                                                                              

tests/test_processing.py::test_run_filters 
----------------------------------------------------------------------- live log setup ------------------------------------------------------------------------
INFO     topostats:io.py:1015 Extracting image from /home/neil/work/git/hub/AFM-SPM/TopoStats/tests/resources/test_image/minicircle_small.topostats
INFO     topostats:io.py:644 Loading image from : /home/neil/work/git/hub/AFM-SPM/TopoStats/tests/resources/test_image/minicircle_small.topostats
INFO     topostats:io.py:1151 Loading hdf5 key: image
INFO     topostats:io.py:1151 Loading hdf5 key: pixel_to_nm_scaling
INFO     topostats:io.py:1151 Loading hdf5 key: topostats_file_version
INFO     topostats:io.py:652 TopoStats file version: 0.1
INFO     topostats:io.py:1057 [minicircle_small] Image added to processing.
------------------------------------------------------------------------ live log call ------------------------------------------------------------------------
INFO     topostats:processing.py:79 [dummy filename] Image dimensions: (64, 64)
INFO     topostats:processing.py:80 [dummy filename] : *** Filtering ***
INFO     topostats:filters.py:178 [dummy filename] : Median flattening without mask
INFO     topostats:filters.py:218 [dummy filename] : Plane tilt removal without mask
INFO     topostats:filters.py:229 [dummy filename] : x-polyfit 1st order: [ 0.01287956 -0.43328492]
INFO     topostats:filters.py:231 [dummy filename] : y-polyfit 1st order: [ 1.67143466e-17 -6.09768646e-16]
INFO     topostats:filters.py:235 [dummy filename] : Removing x plane tilt
INFO     topostats:filters.py:246 [dummy filename] : removing y plane tilt
INFO     topostats:filters.py:383 [dummy filename] : Remove quadratic bow without mask
INFO     topostats:filters.py:390 [dummy filename] : x polyfit 2nd order: [-8.78008084e-05  5.53145093e-03 -4.90443244e-01]
INFO     topostats:filters.py:347 [dummy filename] : Nonlinear polynomial removal optimal params: const: -0.13058404983843036 xy: 2.406788411046036e-05 x: 0.0016721081736799663 y: 0.0011862968183509776
INFO     topostats:filters.py:523 [dummy filename] : Initial scar removal
INFO     topostats:scars.py:392 [dummy filename] : Removing scars
INFO     topostats:filters.py:461 [dummy filename] : Zero averaging background : 0.0005879895337349665 nm
INFO     topostats:utils.py:173 [dummy filename] : Masking (above) Threshold: 0.7200521040448227
INFO     topostats:utils.py:175 [dummy filename] : Masking (below) Threshold: -7.200521040448226
INFO     topostats:filters.py:175 [dummy filename] : Median flattening with mask
INFO     topostats:filters.py:215 [dummy filename] : Plane tilt removal with mask
INFO     topostats:filters.py:229 [dummy filename] : x-polyfit 1st order: [ 0.00011906 -0.00039111]
INFO     topostats:filters.py:231 [dummy filename] : y-polyfit 1st order: [-1.65211760e-20  8.32667268e-18]
INFO     topostats:filters.py:235 [dummy filename] : Removing x plane tilt
INFO     topostats:filters.py:246 [dummy filename] : removing y plane tilt
INFO     topostats:filters.py:380 [dummy filename] : Remove quadratic bow with mask
INFO     topostats:filters.py:390 [dummy filename] : x polyfit 2nd order: [-1.17717765e-06  7.41621917e-05 -1.15745174e-03]
INFO     topostats:filters.py:347 [dummy filename] : Nonlinear polynomial removal optimal params: const: 0.03518261704381109 xy: -2.8397760380105552e-05 x: -0.0009275597111546574 y: -3.7884590041947774e-05
INFO     topostats:filters.py:568 [dummy filename] : Secondary scar removal
INFO     topostats:scars.py:392 [dummy filename] : Removing scars
INFO     topostats:filters.py:461 [dummy filename] : Zero averaging background : -0.0014525606584533095 nm
INFO     topostats:filters.py:480 [dummy filename] : Applying Gaussian filter (mode : nearest; Gaussian blur (px) : 1.0121397464510862).
INFO     topostats:processing.py:91 [dummy filename] : Plotting Filtering Images
PASSED                                                                                                                                                  [100%]

==================================================================== fixture duration top =====================================================================
total          name                num avg            min           
0:00:00.102977 process_scan_config   1 0:00:00.102977 0:00:00.102977
0:00:00.108359         grand total   5 0:00:00.000938 0:00:00.000067
=================================================================== test call duration top ====================================================================
total          name                num avg            min           
0:00:00.825746    test_run_filters   1 0:00:00.825746 0:00:00.825746
0:00:00.825746         grand total   1 0:00:00.825746 0:00:00.825746
=================================================================== test setup duration top ===================================================================
total          name                num avg            min           
0:00:00.109061    test_run_filters   1 0:00:00.109061 0:00:00.109061
0:00:00.109061         grand total   1 0:00:00.109061 0:00:00.109061
================================================================= test teardown duration top ==================================================================
total          name                num avg            min           
0:00:00.000470         grand total   1 0:00:00.000470 0:00:00.000470
-------------------------------------------------------------------- pytest-regtest report --------------------------------------------------------------------
total number of failed regression tests: 0
total number of failed snapshot tests  : 0

---------- coverage: platform linux, python 3.11.9-final-0 -----------
Name                                      Stmts   Miss  Cover
-------------------------------------------------------------
topostats/__main__.py                         2      2     0%
topostats/entry_point.py                    106    106     0%
topostats/filters.py                        155     15    90%
topostats/grains.py                         106     81    24%
topostats/grainstats.py                     332    281    15%
topostats/io.py                             469    349    26%
topostats/logs/logs.py                       25      0   100%
topostats/measure/geometry.py                85     74    13%
topostats/plotting.py                       182    150    18%
topostats/plottingfuncs.py                  137     49    64%
topostats/processing.py                     356    302    15%
topostats/run_topostats.py                  137    137     0%
topostats/scars.py                          106      1    99%
topostats/statistics.py                      26     20    23%
topostats/theme.py                           46      5    89%
topostats/thresholds.py                      31     12    61%
topostats/tracing/disordered_tracing.py     169    135    20%
topostats/tracing/dnacurvature.py            24     24     0%
topostats/tracing/dnatracing.py             480    480     0%
topostats/tracing/nodestats.py              657    552    16%
topostats/tracing/ordered_tracing.py        360    317    12%
topostats/tracing/pruning.py                222    180    19%
topostats/tracing/skeletonize.py            136     95    30%
topostats/tracing/splining.py               184    159    14%
topostats/tracing/tracingfuncs.py           276    243    12%
topostats/utils.py                          116     72    38%
topostats/validation.py                      15     15     0%
-------------------------------------------------------------
TOTAL                                      4940   3856    22%

====================================================================== 1 passed in 3.26s ======================================================================

With 10 cores I can run the whole test-suite (albeit still with a bunch of failures/errors) in 5m 48sec.

This is therefore some sort of performance regression between Topoly and Python 3.12. Before I craft a very long email explaining what I've found to the authors (as they don't have an open repository to report bugs/issues) I'd be grateful if this could be checked on another architecture under Python 3.12 if anyone has time to do so. You can install specific versions of Python using Conda

Something like the following should do the trick I think (untested though) but would pull in topoly-1.0.2 if running on OSX with ARM chip as the newer 1.0.4 isn't available yet (perhaps tweak version in the target output filename to avoid confusion).

conda create --name topostats-312 python==3.12.5
conda activate topostats-312
cd path/to/topostats
pip install -e .[dev,tests]
mkdir prof
python --version
pip show pytest
pip show topoly
python -m cProfile -o prof/test_run_filters_cProfile_topoly-1.0.4-python-3.12.prof $(which pytest) tests/test_processing.py::test_run_filters 
SylviaWhittle commented 1 month ago

Mine cannot find an appropriate version of topoly to install

<on maxgamill-sheffield/topology>:

 8015  pyenv virtualenv 3.12.5 temp-topology-hang
 8016  pyenv activate temp-topology-hang
 8017  pip install -e .[dev,tests]
...
ERROR: Could not find a version that satisfies the requirement topoly (from topostats) (from versions: none)
ERROR: No matching distribution found for topoly
ns-rse commented 1 month ago

Ah that is expected @SylviaWhittle as we discovered they haven't released wheels for 1.0.3 or 1.0.4 for OSX ARM.

Similar reason to why I couldn't install 1.0.2 under Python 3.12, they haven't built wheels for that version.

MaxGamill-Sheffield commented 1 month ago

@ns-rse When I was testing on on the lab Windows x86 computer, it also would only download topoly 1.0.2 for both python 3.11. I've double checked python 3.12 on Yeti's laptop and there are no available versions of topoly.

ns-rse commented 1 month ago

Yes, checking the release history it seems there are no wheels for Python 3.12 beyond Topoly 1.0.2, it seems to be a bit of a mess really. Seems we can't check. Apologies for not looking more closely I was fitting this in between work on my other projects.

I'll craft a report to the authors (which in itself will likely take a couple of hours) and request builds for all architectures across all common Python versions as its going to cause us headaches if this is to remain a key dependency and releases continue in such a disjointed manner.

MaxGamill-Sheffield commented 1 month ago

No this was a really good catch thank you Neil! We'd hate for people to try our software in our upcoming workshop and hit this wall.

As a temporary fix could we specify in the docs to use Python <3.12 for now so we can align main with our paper?

ns-rse commented 1 month ago

I would go a step further and explicitly remove support for Python 3.12 at the package configuration level (i.e. pyproject.toml) and remove it from the test matrix.

I've added it as a task to #800 and created a separate issue, see #930.

I'll leave this issue open until we have resolved support for Python 3.12 (and hopefully 3.13 #928) with Topoly developers/maintainers.

prubach commented 2 weeks ago

Hi @MaxGamill-Sheffield and @ns-rse my name is Pawel Rubach and I'm responsible for integrating and releasing the topoly package. I was able to reproduce the performance problem you've mentioned here using your code and indeed it only hangs on 3.12.

It's difficult to explain since the polvalues.py file that it hangs on doesn't even contain any code, just a list of dictionaries used to translate the polynomials parameters into known topologies.

I also ran (on Linux/Ubuntu 24.04) the tests that we use to test topoly before releasing and didn't notice any performance difference between Python 3.11 and 3.12 or even a slightly better performance on 3.12.

I managed to build the 1.0.4 version for Windows (it is already on PyPi) and I hope to get the Mac/Apple Silicon version compiled this week. It is difficult since I don't have the hardware at hand and have to remotely run the build process which is pretty complex on my colleagues personal Mac overseas...

To further investigate the problem you're facing I'd have to look exactly at how TopoStats uses topoly and try to isolate this functionality. I'll try to look into it in the coming days but if you have any suggestions than that would be welcome.

ns-rse commented 2 weeks ago

@prubach

Thanks for taking the time to investigate this regression test. I found it strange too as I looked at the code and could also see that polvalues.py was just a bunch of dictionaries. A colleague who I asked about this suggested it may be a regression at an even higher level with Python.

@MaxGamill-Sheffield will hopefully be able to point you to where and how topoly functionality is being used within the code base if needed but the profiling I performed suggests, I think (I'm no expert on profiling!), that the problem occurs simply on loading the topoly library which points towards it perhaps being a higher level regression test.

ns-rse commented 2 weeks ago

By the way @prubach is there a public issue tracker where I might be able to provide more information and track progress?

prubach commented 2 weeks ago

@ns-rse Thanks for the clarifications. BTW while I was investigating that matter I found another issue when using certain functions in Topoly (mostly GLN). There seems to be some incompatibility with numpy 2.x. We built the 1.0.4 version before numpy 2.x was out and there are no issues with older numpy versions but since I didn't limit the version, when you now create a new virtual environment numpy 2.x is by default installed on newer versions of python. I have no clue about the cause as it results in some segfaults but only with certain input data, even though we don't use numpy data types etc. in the compiled code. I'm considering restricting the numpy versions for now but need to investigate it further.

As to the issue tracker I think the best option for now is to use the topoly_tutorial public repo: https://github.com/ilbsm/topoly_tutorial/issues

BTW I've noticed this old issue there and we'll try to address that soon as well, sorry it must have slipped through...