Probe-Particle / ppafm

Classical force field model for simulating atomic force microscopy images.
MIT License
50 stars 19 forks source link

Continuous-integration tests keep failing, blocking any new pull request #279

Closed mondracek closed 1 month ago

mondracek commented 6 months ago

I stumbled over this issue when trying to submit PR #278. This is essentially a copy of what I've reported there:

I don't have much of an idea how to debug and fix these issues with the GitHub 'workflow' functionality, so I would appreciate any help from you guys who know more about it.

NikoOinonen commented 6 months ago

Considering that the tests started failing without any change from our side, it looks like something changed in the Github test runner.

The tests fail at the second version of python that is going to be tested, no matter which version it happens to be.

If I recall @yakutovicha was saying before that running multiple simulations in a single process can cause failures because the state of the parameters persist between simulations. Maybe the test runner changed so that the different python versions somehow share resources or something?

mondracek commented 6 months ago

Yeah, @yakutovicha , do you think this is related to the #232 issue?

NikoOinonen commented 6 months ago

I did some more testing, and it looks to me like it's actually specific versions of Python that are failing. 3.8, 3.9, and 3.10 seem to be affected. I was also able to replicate the segfault locally.

NikoOinonen commented 6 months ago

Not only different versions of Python, but different patch versions. For example, 3.9.18 works, but 3.9.19 does not, so I think this is why it started to fail suddenly.

This also has something to do with the parameter sharing between simulation runs, because the specific test that fails, only fails if run after another test, but not when run individually. This makes it annoying to try to debug.

NikoOinonen commented 6 months ago

Fixing #232 might fix the underlying problem, so probably the quickest way to deal with this for now would be to simply disable the one test that is failing. This makes the test pass: https://github.com/Probe-Particle/ppafm/actions/runs/9128561527/job/25101162232.

I can make the PR if this is okay?

mondracek commented 6 months ago

@NikoOinonen, yes, please make the PR.

ondrejkrejci commented 3 months ago

Reposted from #295 :

It looks to me, like a bug in GitHub or lowering down the policy, since all tests have died at 4 minutes marks. I did not changed any part of the "active code", only README and a commented part. Anyway the test stopped with the following error for python3.7:

Run PPAFM_RECOMPILE=1 pytest tests examples/PTCDA_Hartree_dz2 -v --cov --cov-report json
============================= test session starts ==============================
platform linux -- Python 3.7.17, pytest-7.4.4, pluggy-1.2.0 -- /opt/hostedtoolcache/Python/3.7.17/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/ppafm/ppafm
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 12 items

tests/test_afmulator.py::test_afmulator_save_load PASSED                 [  8%]
tests/test_atomicUtils.py::test_ZsToElems PASSED                         [ 16%]
tests/test_common.py::test_get_df_weight PASSED                          [ 25%]
tests/test_common.py::test_get_simple_df_weight PASSED                   [ 33%]
tests/test_common.py::test_sphere_tangent_space PASSED                   [ 41%]
tests/test_datagrid.py::test_power PASSED                                [ 50%]
tests/test_datagrid.py::test_tip_interp PASSED                           [ 58%]
tests/test_generator.py::test_GeneratorAFMtrainer PASSED                 [ 66%]
tests/test_io.py::test_xyz PASSED                                        [ 75%]
tests/test_io.py::test_parse_comment_ase PASSED                          [ 83%]
tests/test_io.py::test_load_aims PASSED                                  [ 91%]
Error: The operation was canceled.

The python 3.11 seems an interesting one:

Run PPAFM_RECOMPILE=1 pytest tests examples/PTCDA_Hartree_dz2 -v --cov --cov-report json
============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0 -- /opt/hostedtoolcache/Python/3.11.9/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/ppafm/ppafm
configfile: pyproject.toml
plugins: cov-5.0.0
collecting ... collected 12 items

tests/test_afmulator.py::test_afmulator_save_load PASSED                 [  8%]
tests/test_atomicUtils.py::test_ZsToElems PASSED                         [ 16%]
tests/test_common.py::test_get_df_weight PASSED                          [ 25%]
tests/test_common.py::test_get_simple_df_weight PASSED                   [ 33%]
tests/test_common.py::test_sphere_tangent_space PASSED                   [ 41%]
tests/test_datagrid.py::test_power PASSED                                [ 50%]
tests/test_datagrid.py::test_tip_interp PASSED                           [ 58%]
tests/test_generator.py::test_GeneratorAFMtrainer PASSED                 [ 66%]
tests/test_io.py::test_xyz PASSED                                        [ 75%]
tests/test_io.py::test_parse_comment_ase PASSED                          [ 83%]
tests/test_io.py::test_load_aims PASSED                                  [ 91%]
/home/runner/work/_temp/a2a60b1b-2468-44ab-93c7-89e20eb061a1.sh: line 1:  1839 Segmentation fault      (core dumped) PPAFM_RECOMPILE=1 pytest tests examples/PTCDA_Hartree_dz2 -v --cov --cov-report json
examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree 
Error: Process completed with exit code 139.

Similarly 3.12 is having problems with the PTCDA_Hartree_dz2 example:

Run PPAFM_RECOMPILE=1 pytest tests examples/PTCDA_Hartree_dz2 -v --cov --cov-report json
============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- /opt/hostedtoolcache/Python/3.12.4/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/ppafm/ppafm
configfile: pyproject.toml
plugins: cov-5.0.0
collecting ... collected 12 items

tests/test_afmulator.py::test_afmulator_save_load PASSED                 [  8%]
tests/test_atomicUtils.py::test_ZsToElems PASSED                         [ 16%]
tests/test_common.py::test_get_df_weight PASSED                          [ 25%]
tests/test_common.py::test_get_simple_df_weight PASSED                   [ 33%]
tests/test_common.py::test_sphere_tangent_space PASSED                   [ 41%]
tests/test_datagrid.py::test_power PASSED                                [ 50%]
tests/test_datagrid.py::test_tip_interp PASSED                           [ 58%]
tests/test_generator.py::test_GeneratorAFMtrainer PASSED                 [ 66%]
tests/test_io.py::test_xyz PASSED                                        [ 75%]
tests/test_io.py::test_parse_comment_ase PASSED                          [ 83%]
tests/test_io.py::test_load_aims PASSED                                  [ 91%]
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_reduce_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_reduce_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_scan_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_scan_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_transpose_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_algorithms_transpose_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_functions_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_functions_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_kernel_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_kernel_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_vsize_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_cluda_vsize_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_fft_fft_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_fft_fft_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/coverage/report_core.py:115: CoverageWarning: Couldn't parse '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_fft_fftshift_mako': No source for code: '/home/runner/work/ppafm/ppafm/_opt_hostedtoolcache_Python_3_12_4_x64_lib_python3_12_site_packages_reikna_fft_fftshift_mako'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree PASSED [100%]

=============================== warnings summary ===============================
../../../../../opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/pytools/persistent_dict.py:59
  /opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/pytools/persistent_dict.py:59: UserWarning: Unable to import recommended hash 'siphash24.siphash13', falling back to 'hashlib.sha256'. Run 'python3 -m pip install siphash24' to install the recommended hash.
    warn("Unable to import recommended hash 'siphash24.siphash13', "

ppafm/cli/plot_results.py:12
  /home/runner/work/ppafm/ppafm/ppafm/cli/plot_results.py:12: MatplotlibDeprecationWarning: Auto-close()ing of figures upon backend switching is deprecated since 3.8 and will be removed in 3.10.  To suppress this warning, explicitly call plt.close('all') first.
    mpl.use("Agg")

tests/test_afmulator.py::test_afmulator_save_load
tests/test_afmulator.py::test_afmulator_save_load
tests/test_afmulator.py::test_afmulator_save_load
examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree
examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree
  /home/runner/work/ppafm/ppafm/ppafm/fieldFFT.py:232: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
    return np.matrix(Lmat)

examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree
  /home/runner/work/ppafm/ppafm/ppafm/fieldFFT.py:18: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
    return np.matrix(lvec[1:])

examples/PTCDA_Hartree_dz2/example_ptcda_hartree.py::example_ptcda_hartree
  /opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/numpy/matrixlib/defmatrix.py:70: PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
    return matrix(data, dtype=dtype, copy=False)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.12.4-final-0 -----------
Coverage JSON written to file coverage.json

================== 12 passed, 9 warnings in 203.48s (0:03:23) ==================
FileRead program: reading Q-0.10K0.50/OutFz.xsf file
XYZ dimensions are 62 202 202
Reading DONE
Error: The operation was canceled.
NikoOinonen commented 2 months ago

I was testing this a bit just now, and got the problem to show up consistently on my local machine.

The following seems to cause a segfault every time:

export MPLBACKEND=AGG
export PPAFM_RECOMPILE=1

pytest -v \
    tests/test_generator.py \
    tests/human_eye/test_TipForce.py \
    examples/PTCDA_Hartree_dz2

It is this specific combination that is not working. Removing any one of those tests makes it run without problem.

NikoOinonen commented 1 month ago

I think I found the source of this issue. I made a separate issue for the specific problem: #308.

NikoOinonen commented 1 month ago

The issue should be fixed now. The CI tests seem to be passing every time now.