brinckmann / montepython_public

Public repository for the Monte Python Code
MIT License
92 stars 78 forks source link

MPI run issue #330

Closed raj1996cool closed 8 months ago

raj1996cool commented 1 year ago

when I am using the mpi command for running montepython and run has been started but after some time I get this error

[Raj-icamp:87525] Process received signal [Raj-icamp:87525] Signal: Segmentation fault (11) [Raj-icamp:87525] Signal code: Address not mapped (1) [Raj-icamp:87525] Failing at address: 0x8 [Raj-icamp:87525] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7ff024308090] [Raj-icamp:87525] [ 1] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(numjac+0x5ab)[0x7feff62051db] [Raj-icamp:87525] [ 2] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(evolver_ndf15+0x5b6)[0x7feff6207236] [Raj-icamp:87525] [ 3] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(background_solve+0x366)[0x7feff61885a6] [Raj-icamp:87525] [ 4] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(background_init+0xa4)[0x7feff6189404] [Raj-icamp:87525] [ 5] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_try_unknown_parameters+0x2a0)[0x7feff617ebd0] [Raj-icamp:87525] [ 6] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_fzerofun_1d+0x2d)[0x7feff617f6cd] [Raj-icamp:87525] [ 7] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_fzero_ridder+0x32d)[0x7feff615c8ad] [Raj-icamp:87525] [ 8] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_find_root+0x25c)[0x7feff617f94c] [Raj-icamp:87525] [ 9] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_shooting+0xb59)[0x7feff6180529] [Raj-icamp:87525] [10] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(input_read_from_file+0x137)[0x7feff6180eb7] [Raj-icamp:87525] [11] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(+0x207e8)[0x7feff60997e8] [Raj-icamp:87525] [12] /home/raj/Documents/code/class_public-3.2.0/python/build/lib.linux-x86_64-2.7/classy.so(+0x1f841)[0x7feff6098841] [Raj-icamp:87525] [13] python(PyEval_EvalFrameEx+0x482)[0x55a20aaa4f32] [Raj-icamp:87525] [14] python(PyEval_EvalCodeEx+0x52e)[0x55a20aaa2d2e] [Raj-icamp:87525] [15] python(PyEval_EvalFrameEx+0x66b3)[0x55a20aaab163] [Raj-icamp:87525] [16] python(PyEval_EvalCodeEx+0x52e)[0x55a20aaa2d2e] [Raj-icamp:87525] [17] python(PyEval_EvalFrameEx+0x66b3)[0x55a20aaab163] [Raj-icamp:87525] [18] python(PyEval_EvalCodeEx+0x52e)[0x55a20aaa2d2e] [Raj-icamp:87525] [19] python(PyEval_EvalFrameEx+0x66b3)[0x55a20aaab163] [Raj-icamp:87525] [20] python(PyEval_EvalCodeEx+0x52e)[0x55a20aaa2d2e] [Raj-icamp:87525] [21] python(PyEval_EvalFrameEx+0x6193)[0x55a20aaaac43] [Raj-icamp:87525] [22] python(PyEval_EvalCodeEx+0x52e)[0x55a20aaa2d2e] [Raj-icamp:87525] [23] python(PyEval_EvalCode+0x1a)[0x55a20aaa27fa] [Raj-icamp:87525] [24] python(+0x1272a4)[0x55a20aad62a4] [Raj-icamp:87525] [25] python(PyRun_FileExFlags+0x8b)[0x55a20aad10db] [Raj-icamp:87525] [26] python(PyRun_SimpleFileExFlags+0x169)[0x55a20aad0159] [Raj-icamp:87525] [27] python(Py_Main+0x58d)[0x55a20aa6dc7d] [Raj-icamp:87525] [28] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ff0242e9083] [Raj-icamp:87525] [29] python(_start+0x2e)[0x55a20aa6d61e] [Raj-icamp:87525] End of error message

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 2 with PID 0 on node Raj-icamp exited on signal 11 (Segmentation fault).

Actually I am unable to understand this error , can anyone help me to solve this issue

brinckmann commented 1 year ago

Hi,

We need more information to have a chance to provide suggestions. Are you running an unmodified CLASS? Which paremeters are you sampling and what are your allowed parameter ranges? If an unmodified CLASS it might be exceeding the allowed parameter space. If it's a modified CLASS it's hard to say exactly what the problem is, assuming it doesn't crash when run directly for the allowed parameter range, maybe a memory leak?

Best, Thejs