CRPropa / CRPropa3

CRPropa is a public astrophysical simulation framework for propagating extraterrestrial ultra-high energy particles.
GNU General Public License v3.0
68 stars 67 forks source link

testMagneticLensPythonInterface fails #350

Closed afedynitch closed 2 years ago

afedynitch commented 3 years ago


When building latest master or 3.1.7 tag on Ubuntu 20.04 one of the tests fails with SEGFAULT.

Running tests... Test project /home/afedyni/CRPropa3/build Start 1: testCore 1/20 Test #1: testCore .......................... Passed 0.04 sec Start 2: testVector3 2/20 Test #2: testVector3 ....................... Passed 0.00 sec Start 3: testModuleList 3/20 Test #3: testModuleList .................... Passed 0.03 sec Start 4: testMagneticField 4/20 Test #4: testMagneticField ................. Passed 0.03 sec Start 5: testTurbulentField 5/20 Test #5: testTurbulentField ................ Passed 0.23 sec Start 6: testAdvectionField 6/20 Test #6: testAdvectionField ................ Passed 0.03 sec Start 7: testDensity 7/20 Test #7: testDensity ....................... Passed 0.03 sec Start 8: testDINT 8/20 Test #8: testDINT .......................... Passed 0.05 sec Start 9: testPropagation 9/20 Test #9: testPropagation ................... Passed 0.03 sec Start 10: testBreakCondition 10/20 Test #10: testBreakCondition ................ Passed 0.03 sec Start 11: testInteraction 11/20 Test #11: testInteraction ................... Passed 38.33 sec Start 12: testSource 12/20 Test #12: testSource ........................ Passed 0.20 sec Start 13: testOutput 13/20 Test #13: testOutput ........................ Passed 0.04 sec Start 14: testFunctionalGroups 14/20 Test #14: testFunctionalGroups .............. Passed 0.03 sec Start 15: testAdiabaticCooling 15/20 Test #15: testAdiabaticCooling .............. Passed 0.03 sec Start 16: testGalacticMagneticLens 16/20 Test #16: testGalacticMagneticLens .......... Passed 0.07 sec Start 17: testMagneticLensPythonInterface 17/20 Test #17: testMagneticLensPythonInterface ...***Exception: SegFault 0.21 sec Start 18: testSimulationExecution 18/20 Test #18: testSimulationExecution ........... Passed 2.53 sec Start 19: testDiffusionSDE 19/20 Test #19: testDiffusionSDE .................. Passed 3.15 sec Start 20: testPythonExtension 20/20 Test #20: testPythonExtension ............... Passed 0.17 sec

95% tests passed, 1 tests failed out of 20

Total Test time (real) = 45.24 sec

The following tests FAILED: 17 - testMagneticLensPythonInterface (SEGFAULT)

We have tested this on different installations of Ubuntu 20.04, SWIG3 or 4, numpy 1.19.2 and 1.20.1, conda python and system python only, etc. If that's of any importance, numpy is linked to MKL 2020.0 due to other parts of our code requiring linear algebra performance on AMD Zen, we can not use openblas or other BLAS.

The SEGFAULT is triggered by the first access using numpy arrays:

def testAddParticlesNumpyInterface(self):
            import numpy as np
            print("Cannot import numpy. Not testing testAddParticlesNumpyInterface!")

        N = 13
        ids = np.ones(N, dtype='i')
        lats = np.random.rand(N) * np.pi - np.pi/2.
        lons = np.random.rand(N) * np.pi * 2. - np.pi
        energy = 10 ** (18 + np.random.rand(N) * 3.) * crpropa.eV
        weights = np.ones(N) / N
        self.maps.addParticles(ids, energy, lons, lats, weights ) <----

Note that this is different from #316 where all python tests failed. Here I the library was checked with ldd what it links to and it doesn't matter if it's conda stuff or native system packages.


lukasmerten commented 3 years ago

I can reproduce this error on Ubuntu 18.

If someone encounters this problem in their analysis: you can work around this bug by manually adding particles with addParticle one at a time.

afedynitch commented 3 years ago


tdwiser commented 3 years ago

I also encountered this is attempting to fix #316. I'm not 100% sure what the actual issue was but fixing the segfaults involved invoking the Python Global Interpreter Lock, and also being extra careful with the memory ordering of the numpy arrays since we are messing with them manually. Skipping these steps makes the code very context sensitive to what numpy/python decide to do with your memory range when calling out to their APIs. See my hacky modifications here: