OpenCOMPES / sed

Single Event Data Frame Processor: Backend to handle photoelectron resolved datastreams
https://opencompes.github.io/sed/
MIT License
3 stars 1 forks source link

Binning extremely slow with current poetry versions #357

Closed rettigl closed 6 months ago

rettigl commented 6 months ago

Describe the bug The binning is really slow with the current package versions in the poetry.lock file

To Reproduce Steps to reproduce the behavior:

  1. Install current versions in poetry environment
  2. Run tutorial notebook 2, and then tutorial notebook 3
  3. Binning cell takes > 5 minute
  4. grafik

Expected behavior Previously, binning took much less time: grafik

Poetry environment:

Package                       Version         Editable project location
----------------------------- --------------- ----------------------------------------
alabaster                     0.7.13
anyio                         4.3.0
argon2-cffi                   23.1.0
argon2-cffi-bindings          21.2.0
arrow                         1.3.0
asciitree                     0.3.3
ase                           3.22.1
asteval                       0.9.32
astropy                       5.2.2
asttokens                     2.4.1
async-lru                     2.0.4
attrs                         23.2.0
Babel                         2.14.0
backcall                      0.2.0
backports.zoneinfo            0.2.1
beautifulsoup4                4.12.3
bleach                        6.1.0
blosc2                        2.0.0
bokeh                         3.1.1
certifi                       2024.2.2
cffi                          1.16.0
charset-normalizer            3.3.2
click                         8.1.7
cloudpickle                   3.0.0
comm                          0.2.1
commonmark                    0.9.1
contourpy                     1.1.1
coverage                      7.4.3
cramjam                       2.8.2
cycler                        0.12.1
Cython                        0.29.28
dask                          2023.5.0
debugpy                       1.8.1
decorator                     5.1.1
defusedxml                    0.7.1
diffpy.structure              3.1.0
diffsims                      0.5.2
dill                          0.3.8
docutils                      0.20.1
entrypoints                   0.4
exceptiongroup                1.2.0
execnet                       2.0.2
executing                     2.0.1
fabio                         2023.10.0
fastdtw                       0.3.4
fasteners                     0.19
fastjsonschema                2.19.1
fastparquet                   2024.2.0
flatdict                      4.0.1
fonttools                     4.49.0
fqdn                          1.5.1
fsspec                        2024.2.0
future                        1.0.0
gitdb                         4.0.11
GitPython                     3.1.42
h11                           0.14.0
h5grove                       2.0.0
h5py                          3.10.0
hdf5plugin                    4.4.0
httpcore                      1.0.4
httpx                         0.27.0
hyperspy                      1.7.4
idna                          3.6
ifes-apt-tc-data-modeling     0.1
imageio                       2.34.0
imagesize                     1.4.1
importlib_metadata            7.0.2
importlib_resources           6.1.3
iniconfig                     2.0.0
ipykernel                     6.29.3
ipympl                        0.9.3
ipyparallel                   8.7.0
ipython                       8.12.3
ipython-genutils              0.2.0
ipywidgets                    8.1.2
isoduration                   20.11.0
jedi                          0.19.1
Jinja2                        3.1.3
joblib                        1.3.2
json5                         0.9.22
jsonpointer                   2.4
jsonschema                    4.21.1
jsonschema-specifications     2023.12.1
jupyter                       1.0.0
jupyter_client                8.6.0
jupyter-console               6.6.3
jupyter_core                  5.7.1
jupyter-events                0.9.0
jupyter-lsp                   2.2.4
jupyter_server                2.13.0
jupyter_server_terminals      0.5.2
jupyterlab                    4.1.4
jupyterlab_h5web              12.0.0
jupyterlab_pygments           0.3.0
jupyterlab_server             2.25.4
jupyterlab_widgets            3.0.10
kikuchipy                     0.9.0
kiwisolver                    1.4.5
lazy_loader                   0.3
llvmlite                      0.41.1
lmfit                         1.2.2
locket                        1.0.0
lxml                          5.1.0
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.7.5
matplotlib-inline             0.1.6
matplotlib-scalebar           0.8.1
mdurl                         0.1.2
mergedeep                     1.3.4
mistune                       3.0.2
mpmath                        1.3.0
msgpack                       1.0.8
mypy                          1.9.0
mypy-extensions               1.0.0
natsort                       8.4.0
nbclient                      0.9.0
nbconvert                     7.16.2
nbformat                      5.9.2
nest-asyncio                  1.6.0
networkx                      3.1
niondata                      0.15.3
nionswift                     0.16.8
nionswift-io                  0.15.1
nionui                        0.6.10
nionutils                     0.4.6
notebook                      7.1.1
notebook_shim                 0.2.4
numba                         0.58.1
numcodecs                     0.12.1
numexpr                       2.8.5
numpy                         1.24.4
numpy-quaternion              2023.0.2
opencv-python                 4.9.0.80
orix                          0.11.1
orjson                        3.9.15
overrides                     7.7.0
packaging                     24.0
pandas                        2.0.3
pandocfilters                 1.5.1
parso                         0.8.3
partd                         1.4.1
pexpect                       4.9.0
photutils                     1.8.0
pickleshare                   0.7.5
pillow                        10.2.0
Pint                          0.21.1
pip                           23.3.1
pkgutil_resolve_name          1.3.10
platformdirs                  4.2.0
pluggy                        1.4.0
ply                           3.11
pooch                         1.8.1
pprintpp                      0.4.0
prettytable                   3.10.0
prometheus_client             0.20.0
prompt-toolkit                3.0.43
psutil                        5.9.8
ptyprocess                    0.7.0
pure-eval                     0.2.2
py                            1.11.0
py-cpuinfo                    9.0.0
pyarrow                       15.0.1
PyCifRW                       4.4.6
pycparser                     2.21
pyerfa                        2.0.0.3
pyFAI                         2024.2.0
pyfakefs                      5.3.5
Pygments                      2.17.2
pynxtools                     0.1.1
pynxtools-mpes                0.0.1
pyparsing                     3.1.2
pytest                        8.1.1
pytest-clarity                1.0.1
pytest-cov                    4.1.0
pytest-forked                 1.6.0
pytest-xdist                  3.5.0
python-dateutil               2.9.0.post0
python-json-logger            2.0.7
pytz                          2024.1
pytz-deprecation-shim         0.1.0.post0
PyWavelets                    1.4.1
pyxem                         0.15.1
PyYAML                        6.0.1
pyzmq                         25.1.2
qtconsole                     5.5.1
QtPy                          2.4.1
radioactivedecay              0.4.22
rapidfuzz                     3.4.0
recommonmark                  0.7.1
referencing                   0.33.0
requests                      2.31.0
requests-mock                 1.11.0
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
rich                          13.7.1
rpds-py                       0.18.0
ruff                          0.2.2
scikit-image                  0.21.0
scikit-learn                  1.3.2
scipy                         1.10.1
sed                           0.1.0           /mnt/pcshare/users/Laurenz/AreaB/sed/sed
sed-processor                 0.1.9a0         /mnt/pcshare/users/Laurenz/AreaB/sed/sed
Send2Trash                    1.8.2
setuptools                    69.1.1
setuptools-scm                8.0.4
silx                          2.0.0
six                           1.16.0
smmap                         5.0.1
sniffio                       1.3.1
snowballstemmer               2.2.0
soupsieve                     2.5
sparse                        0.15.1
Sphinx                        7.1.2
sphinx-rtd-theme              2.0.0
sphinxcontrib-applehelp       1.0.4
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.1
sphinxcontrib-jquery          4.1
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
stack-data                    0.6.3
symmetrize                    0.5.5
sympy                         1.12
tables                        3.8.0
terminado                     0.18.0
threadpoolctl                 3.3.0
tifffile                      2023.7.10
tinycss2                      1.2.1
tomli                         2.0.1
tomlkit                       0.12.3
toolz                         0.12.1
tornado                       6.4
tqdm                          4.66.2
traitlets                     5.14.1
traits                        6.4.3
transforms3d                  0.4.1
types-python-dateutil         2.8.19.20240311
types-PyYAML                  6.0.12.20240311
types-requests                2.31.0.20240311
typing_extensions             4.10.0
tzdata                        2024.1
tzlocal                       4.3
uncertainties                 3.1.7
uri-template                  1.3.0
urllib3                       2.2.1
wcwidth                       0.2.13
webcolors                     1.13
webencodings                  0.5.1
websocket-client              1.7.0
wheel                         0.41.1
widgetsnbextension            4.0.10
xarray                        2023.1.0
xyzservices                   2023.10.1
zarr                          2.16.1
zipfile37                     0.1.3
zipp                          3.17.0

Reference enviroment:

Package                       Version     Editable project location
----------------------------- ----------- ----------------------------------------
alabaster                     0.7.12
aniso8601                     7.0.0
annotated-types               0.6.0
anyio                         3.5.0
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
asciitree                     0.3.3
ase                           3.22.1
asteval                       0.9.26
astroid                       3.0.0
astropy                       5.0.2
asttokens                     2.0.5
atomicwrites                  1.4.0
attrs                         23.1.0
Babel                         2.12.1
backcall                      0.2.0
backports.zoneinfo            0.2.1
beautifulsoup4                4.11.1
bleach                        5.0.0
bokeh                         3.1.1
build                         0.10.0
CacheControl                  0.13.1
cachetools                    4.2.4
cachy                         0.3.0
cattrs                        22.2.0
certifi                       2023.7.22
cffi                          1.16.0
cfgv                          3.3.1
charset-normalizer            3.3.1
cleo                          2.0.1
click                         7.1.2
clikit                        0.6.2
cloudpickle                   2.0.0
commonmark                    0.9.1
contourpy                     1.0.6
coverage                      6.3.2
cramjam                       2.5.0
crashtest                     0.4.1
cryptography                  41.0.5
cycler                        0.11.0
Cython                        0.29.28
dask                          2023.3.0
debugpy                       1.5.1
decorator                     5.1.1
deepdish                      0.3.7
defusedxml                    0.7.1
diffpy.structure              3.1.0
diffsims                      0.5.1
dill                          0.3.5.1
distlib                       0.3.7
distributed                   2023.3.0
docstring-parser              0.12
docutils                      0.20.1
dtaidistance                  2.3.10
dulwich                       0.21.6
ecdsa                         0.17.0
elabapi-python                0.1.7
elasticsearch                 6.8.2
elasticsearch-dsl             6.4.0
entrypoints                   0.4
exceptiongroup                1.1.0
execnet                       2.0.2
executing                     0.8.3
fabio                         0.13.0
fastdtw                       0.3.4
fasteners                     0.18
fastentrypoints               0.12
fastjsonschema                2.16.1
fastparquet                   0.8.3
ffmpeg                        1.4
ffmpeg-python                 0.2.0
filelock                      3.12.4
flake8                        4.0.1
flatdict                      4.0.1
fonttools                     4.30.0
fsspec                        2022.2.0
funcy                         1.17
future                        0.18.2
gitdb                         4.0.10
GitPython                     3.1.31
graphviz                      0.20.1
h11                           0.12.0
h5grove                       1.2.0
h5py                          3.8.0
hdf5plugin                    4.4.0
HeapDict                      1.0.1
html5lib                      1.1
httpcore                      0.14.7
httpx                         0.22.0
hyperspy                      1.7.5
identify                      2.4.12
idna                          3.4
ifes-apt-tc-data-modeling     0.1
igor                          0.3
imageio                       2.27.0
imagesize                     1.3.0
importlib-metadata            6.8.0
importlib-resources           6.1.0
iniconfig                     2.0.0
installer                     0.7.0
ipykernel                     6.9.1
ipympl                        0.9.2
ipyparallel                   8.4.1
ipython                       8.6.0
ipython-genutils              0.2.0
ipywidgets                    7.7.2
isort                         4.3.21
jaraco.classes                3.3.0
jax                           0.4.4
jaxlib                        0.4.4
jedi                          0.18.1
jeepney                       0.8.0
Jinja2                        3.0.3
jmespath                      0.10.0
joblib                        1.2.0
json5                         0.9.11
jsonschema                    4.17.3
jupyter-client                7.1.2
jupyter_core                  5.2.0
jupyter-server                1.23.6
jupyterlab                    3.5.3
jupyterlab-h5web              7.0.0
jupyterlab-pygments           0.2.2
jupyterlab_server             2.22.1
jupyterlab-widgets            1.1.1
keyring                       24.2.0
kikuchipy                     0.8.4
kiwisolver                    1.4.0
lark                          1.1.5
lazy-object-proxy             1.7.1
llvmlite                      0.38.1
lmfit                         1.0.3
locket                        1.0.0
lockfile                      0.12.2
lxml                          4.7.1
markdown-it-py                3.0.0
MarkupSafe                    2.1.0
matplotlib                    3.6.2
matplotlib-inline             0.1.3
matplotlib-scalebar           0.8.1
mccabe                        0.6.1
mdit-py-plugins               0.4.0
mdurl                         0.1.2
mergedeep                     1.3.4
mistune                       0.8.4
more-itertools                10.1.0
mpes                          1.1.3
mpmath                        1.2.1
msgpack                       1.0.7
mypy                          1.8.0
mypy-extensions               1.0.0
myst-parser                   2.0.0
natsort                       8.2.0
nbclassic                     1.0.0
nbclient                      0.6.6
nbconvert                     6.5.3
nbformat                      5.4.0
nbsphinx                      0.9.3
nest-asyncio                  1.5.4
networkx                      2.7.1
nexusformat                   1.0.3
nexusparser                   0.0.1
niondata                      0.15.3
nionswift                     0.16.8
nionswift-io                  0.15.1
nionui                        0.6.10
nionutils                     0.4.6
nodeenv                       1.6.0
nomad-lab                     1.0.3
nose                          1.3.7
notebook                      6.4.12
notebook_shim                 0.2.3
nptyping                      1.4.4
numba                         0.55.2
numcodecs                     0.11.0
numexpr                       2.8.1
numpy                         1.21.6
numpy-quaternion              2022.4.3
nxarray                       0.4.4
odfpy                         1.4.1
opencv-contrib-python         4.7.0.72
opencv-python                 4.8.0.74
opt-einsum                    3.3.0
orix                          0.11.1
orjson                        3.6.0
packaging                     23.2
pandas                        1.5.1
pandocfilters                 1.5.0
parso                         0.8.3
partd                         1.2.0
pasha                         0.1.1
pastel                        0.2.1
pexpect                       4.8.0
photutils                     1.3.0
pickleshare                   0.7.5
Pillow                        9.0.1
Pint                          0.17
pip                           24.0
pkginfo                       1.9.6
pkgutil_resolve_name          1.3.10
platformdirs                  3.11.0
pluggy                        1.0.0
poetry                        1.6.1
poetry-core                   1.7.0
poetry-plugin-export          1.5.0
polars                        0.16.8
pooch                         1.7.0
pre-commit                    2.17.0
prettytable                   3.6.0
prometheus-client             0.14.1
prompt-toolkit                3.0.28
psutil                        5.9.3
ptyprocess                    0.7.0
pure-eval                     0.2.2
py                            1.11.0
pyaml                         21.10.1
pyarrow                       15.0.0
pyasn1                        0.4.8
PyCifRW                       4.4.5
pycodestyle                   2.8.0
pycparser                     2.21
pydantic                      2.6.0
pydantic_core                 2.16.1
pyerfa                        2.0.0.1
pyfai                         2023.3.0
pyfakefs                      5.3.1
pyflakes                      2.4.0
Pygments                      2.17.1
pylev                         1.4.0
pylint                        3.0.1
pylint-plugin-utils           0.5
pynxtools                     0.0.9
pyparsing                     3.0.7
pyproject_hooks               1.0.0
pyRestTable                   2020.0.3
pyrsistent                    0.19.3
pytest                        7.4.2
pytest-cov                    4.1.0
pytest-timeout                1.4.2
pytest-xdist                  3.3.1
python-dateutil               2.8.2
python-jose                   3.3.0
python-json-logger            2.0.2
python-keycloak               0.26.1
pytz                          2021.1
pytz-deprecation-shim         0.1.0.post0
PyWavelets                    1.2.0
pyxem                         0.15.0
PyYAML                        6.0
pyzmq                         22.3.0
radioactivedecay              0.4.17
rapidfuzz                     2.15.2
readme-renderer               35.0
recommonmark                  0.7.1
requests                      2.31.0
requests-cache                1.0.1
requests-mock                 1.11.0
requests-toolbelt             1.0.0
rfc3986                       1.5.0
rich                          12.2.0
rsa                           4.8
ruff                          0.1.8
scikit-image                  0.19.2
scikit-learn                  1.2.2
scipy                         1.9.3
SecretStorage                 3.3.3
sed                           0.1.0       /mnt/pcshare/users/Laurenz/AreaB/sed/sed
sed-processor                 0.1.8a6     /mnt/pcshare/users/Laurenz/AreaB/sed/sed
Send2Trash                    1.8.0
setuptools                    69.1.0
setuptools-scm                6.4.2
shellingham                   1.5.4
silx                          1.1.0
six                           1.16.0
smmap                         5.0.0
sniffio                       1.2.0
snowballstemmer               2.2.0
sortedcontainers              2.4.0
soupsieve                     2.3.2.post1
sparse                        0.14.0
Sphinx                        7.1.2
sphinx-autodoc-typehints      1.17.0
sphinx-rtd-theme              2.0.0
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jquery          4.1
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
stack-data                    0.2.0
structlog                     21.5.0
symmetrize                    0.5.5
sympy                         1.11.1
tables                        3.7.0
tblib                         1.7.0
terminado                     0.15.0
threadpoolctl                 3.1.0
tifffile                      2022.10.10
tinycss2                      1.1.1
toml                          0.10.2
tomli                         2.0.1
tomlkit                       0.12.1
toolz                         0.11.2
tornado                       6.1
tqdm                          4.64.1
traitlets                     5.3.0
traits                        6.4.1
transforms3d                  0.4.1
trove-classifiers             2023.10.18
twine                         4.0.0
typed-ast                     1.4.3
types-PyYAML                  6.0.12.1
types-requests                2.31.0.1
types-urllib3                 1.26.25.13
typing_extensions             4.9.0
typish                        1.9.3
tzdata                        2023.4
tzlocal                       4.3
uncertainties                 3.1.6
url-normalize                 1.4.3
urllib3                       2.0.7
virtualenv                    20.24.6
wcwidth                       0.2.5
webencodings                  0.5.1
websocket-client              1.5.1
wheel                         0.37.1
widgetsnbextension            3.6.1
wrapt                         1.13.3
xarray                        2023.1.0
xyzservices                   2022.9.0
zarr                          2.14.2
zict                          2.1.0
zipfile37                     0.1.3
zipp                          3.17.0
rettigl commented 6 months ago

It appears the culprit is scipy. The issue occures for scipy >= 1.10.0, and only if momentum correction is performed. A major change in this version has been to interpolation code, which is used for momentum correction. I will further look into this.

rettigl commented 6 months ago

The issue seems to be that the RegularGridInterpolator in >=1.10 does prevent efficient parallelization of dataframe computation: With 1.9.3: grafik With 1.10.0:

https://stackoverflow.com/questions/75427538/regulargridinterpolator-excruciatingly-slow-compared-to-interp2d Related: https://github.com/scipy/scipy/issues/18010 https://github.com/scipy/scipy/issues/17356

It seems like this is a known issue, which has not been solved yet. Simplest solution is to limit version of scipy for now.

rettigl commented 6 months ago

Test new interpolation method with scipy.ndimate.map_coordinates: scipy 1.9.3: grafik grafik scipy 1.10.0: grafik grafik

zain-sohail commented 6 months ago

Considering that updating a package can have such a big impact on performance, might be a good idea if we make the pull request bot to also benchmark the binning, with all corrections and calibrations before updating. Not sure how feasible that is.

rettigl commented 6 months ago

Considering that updating a package can have such a big impact on performance, might be a good idea if we make the pull request bot to also benchmark the binning, with all corrections and calibrations before updating. Not sure how feasible that is.

I also considered that, however, I am not sure how reproducible results on the github workers will be. I can still try to set something up, if even only for local tests.

ev-br commented 5 months ago

Came here from the scipy issue. Would be useful to retry with scipy 1.13rc1 , as RGI should be significantly faster in scipy 1.13.