CRPropa / CRPropa3

CRPropa is a public astrophysical simulation framework for propagating extraterrestrial ultra-high energy particles. https://crpropa.github.io/CRPropa3/
https://crpropa.desy.de
GNU General Public License v3.0
65 stars 66 forks source link

Segmentation fault with ParticleMapsContainer's getRandomParticles / getEnergies / getParticles / getMap depending on python and numpy version #410

Closed tfitoussi closed 5 months ago

tfitoussi commented 1 year ago

Dear all,

I discovered that depending on the version of Python and Numpy I'm using (the most recent one), getRandomParticles crashes on a segmentation fault. The error message is essentially empty:

[1]    90883 segmentation fault (core dumped)  python test_CRPropa3_crash.py

To reproduce the problem, I use a modify version an example:

import crpropa
import numpy as np
M = crpropa.ParticleMapsContainer()
for i in range(1000):
    particleId = crpropa.nucleusId(1,1)
    energy = 10 * crpropa.EeV
    wherever = crpropa.Vector3d(-1,3,2)
    momentumVector = crpropa.Random.instance().randFisherVector(wherever, 200)
    M.addParticle(particleId, energy, momentumVector)
pids, energies, lons, lats = M.getRandomParticles(10)
print("no crash")

For sake of reproductibility, I'm using the the last version on CRPropa available at the moment of writing this report (commit a506905)

To identify the source of the problem, I tested it on two machines: Ubuntu 22.04 LTS (a desktop computer) and a Manjaro Linux (personal laptop). I tried different virtual environments with different version of python and numpy. I summarize below the results.

OS gcc Python Numpy SWIG CMake works?
Ubuntu 22.04 LTS 11.3.0 3.10.6 1.23.5 4.0.2 3.22.1 segfault
Ubuntu 22.04 LTS 11.3.0 3.10.6 1.22.3 4.0.2 3.22.1 segfault
Ubuntu 22.04 LTS 11.3.0 3.10.6 1.21.6 4.0.2 3.22.1 segfault
Manjaro 12.2.0 3.10.8 1.23.5 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.10.8 1.22.3 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.10.8 1.21.6 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.9.15 1.23.5 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.9.15 1.22.3 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.9.15 1.21.6 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.9.15 1.19.5 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.9.15 1.18.5 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.8.15 1.23.5 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.8.15 1.22.3 4.0.2 3.24.3 segfault
Manjaro 12.2.0 3.8.15 1.21.6 4.0.2 3.24.3 OK

So far it seems that the bug appears for Python >= 3.8 and Numpy >= 1.21.6

Sorry I did not push further the investigations since I solved the problem with a "trick". But I don't know how long I can keep this trick working as Python 3.8 will be deprecated at some point. I tested it on two different Linux OS so I doubt that it comes from some weird library packaging or some problem related to OS specificities but I'm not an expert enough on that matters to completely discard it.

Cheers, Thomas

lukasmerten commented 1 year ago

Hi Thomas Thanks for reporting, currently I cannot provide a solution, yet. We have encoutered similar problems and try to investigate this further. We will update this issues as soon as we have more information. Best, Lukas

lukasmerten commented 1 year ago

I tried to investigate this today. However, I did not came up with a solution. I'll ping @TobiasWinchen, @rafaelab, and @adundovi as they worked on the SWIG interface before.

This seems to be related to the creation of PyArray_SimpleNew in, e.g. PyObject *getParticleIds_numpyArray() { std::vector<int> v = $self->getParticleIds(); npy_intp size = v.size(); PyObject *out = PyArray_SimpleNew(1, &size, NPY_INT); memcpy(PyArray_DATA((PyArrayObject *) out), &v[0], v.size() * sizeof(int)); return out; }
which can be found here.

We do something similar for the Vector3d wrapper here and following lines, which works fine. I was not able to track down the difference.

When this is not fixed it will limit the use of lenses quite significantly. Without the python wrapping they might still work.

Note: This already influences some of example notebooks: ~/doc/pages/example_notebooks/targeting/Targeting.ipynb ~/doc/pages/example_notebooks/galactic_lensing/lensing_cr.ipynb ~/doc/pages/example_notebooks/galactic_lensing/lensing_maps.ipynb

This will also lead to a failing test in testMagneticLensPythonInterface, when we include testing of more recent Ubuntu/python versions, see here.

Further readings for reference: numpy.i numpy's array API some discussions I some discussions II some discussions III

rafaelab commented 1 year ago

Hi @lukasmerten I will take a look at this issue, but certainly not in the next 10 days. Many new issues are coming up in the swig-python(3) interface, so we should definitely check what is going on. That might include some rewriting of the swig interface.

lukasmerten commented 5 months ago

Should be fixed with #466. Please let us know if not and reopen the issue.

ehlertdo commented 4 months ago

We have tested in our group if the issue is now fixed and also cross-checked with @tfitoussi. Unfortunately the problem remains present for recent versions of python (e.g. 3.11.10 and 3.12.2; on linux). The original workaround with downgrading the python version to 3.8.15 still works.