CEMeNT-PSAAP / MCDC

MC/DC: Monte Carlo Dynamic Code
https://mcdc.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
20 stars 20 forks source link

`Seg fault` with `*** Process received signal ***` #196

Open jpmorgan98 opened 4 months ago

jpmorgan98 commented 4 months ago

So in the OSU CI machine cretin numba problems would copmile but fail to run. This happened on a number of the regression tests as well that where passing in the gh action runner. The full error is here:

(mcdc_dev) cement ~/workspace/MCDC/examples/fixed_source/slab_absorbium 1026$ python input.py --mode=numba
  __  __  ____  __ ____   ____ 
 |  \/  |/ ___|/ /_  _ \ / ___|
 | |\/| | |   /_  / | | | |    
 | |  | | |___ / /| |_| | |___ 
 |_|  |_|\____|// |____/ \____|

           Mode | Numba
      Algorithm | History-based
  MPI Processes | 1
 OpenMP Threads | 1
 Now running TNT...
[cement:17804] *** Process received signal ***
[cement:17804] Signal: Segmentation fault (11)
[cement:17804] Signal code: Address not mapped (1)
[cement:17804] Failing at address: 0x256990c7fa14
[cement:17804] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7fb3607a9630]
[cement:17804] [ 1] [0x7fb2ac5d2160]
[cement:17804] [ 2] [0x7fb2abf790a6]
[cement:17804] [ 3] [0x7fb2ac93d32b]
[cement:17804] [ 4] [0x7fb2ac6ac375]
[cement:17804] [ 5] [0x7fb2a6c13443]
[cement:17804] [ 6] [0x7fb2a6c1381e]
[cement:17804] [ 7] /nfs/stak/users/morgajoa/miniconda3/envs/mcdc_dev/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-x86_64-linux-gnu.so(+0x53f4)[0x7fb3555cc3f4]
[cement:17804] [ 8] /nfs/stak/users/morgajoa/miniconda3/envs/mcdc_dev/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-x86_64-linux-gnu.so(+0x5712)[0x7fb3555cc712]
[cement:17804] [ 9] python(_PyObject_MakeTpCall+0x26c)[0x5041ac]
[cement:17804] [10] python(_PyEval_EvalFrameDefault+0x6a7)[0x5116e7]
[cement:17804] [11] python[0x5cbeda]
[cement:17804] [12] python(PyEval_EvalCode+0x9f)[0x5cb5af]
[cement:17804] [13] python[0x5ec6a7]
[cement:17804] [14] python[0x5e8240]
[cement:17804] [15] python[0x5fd192]
[cement:17804] [16] python(_PyRun_SimpleFileObject+0x19f)[0x5fc55f]
[cement:17804] [17] python(_PyRun_AnyFileObject+0x43)[0x5fc283]
[cement:17804] [18] python(Py_RunMain+0x2ee)[0x5f6efe]
[cement:17804] [19] python(Py_BytesMain+0x39)[0x5bbc79]
[cement:17804] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb35fce5555]
[cement:17804] [21] python[0x5bbac3]
[cement:17804] *** End of error message ***
Segmentation fault (core dumped)

Whenever I see errors like lib64/libc.so.6 my mind immediately goes to incompatible compiler issues. First thing I tried as

conda install -c conda-forge gxx

and that fixed it for some problems but still resulted in a seg fault for others specifically in the regression tests. I am running this in a manual terminal right now but eventually this will be the env that we do gh actions on for GPU regression testing. I am going to try other modules that have g++ and maybe look at llvm versions.

One thing to emphasize is this does seem like a runtime issue, not a compilation failure

jpmorgan98 commented 4 months ago

So this is odd. After an initial compilation some of the tests that had previously failed are passing using cached kernels. Again I still think that this has to do with compiler issues but we will see....

jpmorgan98 commented 4 months ago

Ok so I think I was running into similar issues with the roc port and the soultion was a specific version of libgcc-ng which is installed when conda install gxx

jpmorgan98 commented 4 months ago

@braxtoncuneo can you comment on if this is the same issue you are seeing on Lassen?

braxtoncuneo commented 3 months ago

@braxtoncuneo can you comment on if this is the same issue you are seeing on Lassen?

Reproduced my segfault. This is what I got:


  __  __  ____  __ ____   ____ 
 |  \/  |/ ___|/ /_  _ \ / ___|
 | |\/| | |   /_  / | | | |    
 | |  | | |___ / /| |_| | |___ 
 |_|  |_|\____|// |____/ \____|

           Mode | Numba
      Algorithm | History-based
  MPI Processes | 1
 OpenMP Threads | 1
 Now running TNT...
Segmentation fault