ammarhakim / gkyl

This is the main source repo for the Gkeyll 2.0 code. Please see gkeyll.rtfd.io for details.
https://gkeyll.readthedocs.io/en/latest/
56 stars 15 forks source link

Segmentation fault in luajit on linux cluster #180

Open cmcdevitt2 opened 5 months ago

cmcdevitt2 commented 5 months ago

I am having a similar issue as reported in #176. Specifically, I am able to build gkyl on my local linux cluster, but upon carrying out the first test run I receive the following:

[cmcdevitt@login11 gkyl]$ ~/gkylsoft4/gkyl/bin/gkyl Examples/vm-damp.lua Tue Apr 23 2024 08:27:01.000000000 Gkyl built with 1865a12d679e Gkyl built on Apr 9 2024 10:10:54 Initializing Vlasov-Maxwell simulation ... [login11:3557617:0:3557617] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x28f4fd5605a8) ==== backtrace (tid:3557617) ==== 0 0x0000000000012cf0 funlockfile() :0 1 0x000000000005e8cb lua_dump() ???:0 2 0x0000000000011aef luaL_error() ???:0 3 0x00000000000114be luaL_error() ???:0 4 0x0000000000009c11 ???() /home/cmcdevitt/gkylsoft4/luajit/lib/libluajit-5.1.so.2:0 5 0x000000000001eecc lua_pcall() ???:0 6 0x0000000000405a41 Gkyl::runLua() ???:0 7 0x0000000000405b27 Gkyl::run() ???:0 8 0x0000000000404fe2 main() ???:0 9 0x000000000003ad85 libc_start_main() ???:0 10 0x00000000004050de _start() ???:0

Segmentation fault

This happens during most runs, though occasionally it will run until completion and then seg. fault. The output from the machines/configure command is:

[cmcdevitt@login12 gkyl]$ ./machines/configure.mcdevitt.sh ./waf CC=gcc CXX=g++ MPICC=/apps/mpi/gcc/12.2.0/openmpi/4.1.5/bin/mpicc MPICXX=/apps/mpi/gcc/12.2.0/openmpi/4.1.5/bin/mpicxx --out=build -p /home/cmcdevitt/gkylsoft --prefix=/home/cmcdevitt/gkylsoft/gkyl --cxxflags=-O3,-std=c++17 --luajit-inc-dir=/home/cmcdevitt/gkylsoft/luajit/include/luajit-2.1 --luajit-lib-dir=/home/cmcdevitt/gkylsoft/luajit/lib --luajit-share-dir=/home/cmcdevitt/gkylsoft/luajit/share/luajit-2.1.0-beta3 --enable-mpi --mpi-inc-dir=/apps/mpi/gcc/12.2.0/openmpi/4.1.5/include --mpi-lib-dir=/apps/mpi/gcc/12.2.0/openmpi/4.1.5/lib --mpi-link-libs=mpi --enable-adios --adios-inc-dir=/home/cmcdevitt/gkylsoft/adios2/include --adios-lib-dir=/home/cmcdevitt/gkylsoft/adios2/lib64 --adios-link-libs=adios2_c_mpi --enable-gkylzero --gkylzero-inc-dir=/home/cmcdevitt/gkylsoft/gkylzero/include --gkylzero-lib-dir=/home/cmcdevitt/gkylsoft/gkylzero/lib --enable-superlu --superlu-inc-dir=/home/cmcdevitt/gkylsoft/superlu/include --superlu-lib-dir=/home/cmcdevitt/gkylsoft/superlu/lib --enable-openblas --openblas-inc-dir=/home/cmcdevitt/gkylsoft/OpenBLAS/include --openblas-lib-dir=/home/cmcdevitt/gkylsoft/OpenBLAS/lib configure Setting top to : /blue/cmcdevitt/cmcdevitt/git_home/gkyl Setting out to : /blue/cmcdevitt/cmcdevitt/git_home/gkyl/build Checking for 'gcc' (C compiler) : gcc Checking for 'g++' (C++ compiler) : g++ Setting dependency path: : /home/cmcdevitt/gkylsoft Setting prefix: : /home/cmcdevitt/gkylsoft/gkyl Checking for LUAJIT : Found LuaJIT Checking for MPI : Found MPI Checking for ADIOS : Found ADIOS Checking for Sqlite3 : Using Sqlite3 Checking for SUPERLU : Found SUPERLU Checking for OPENBLAS : Found OPENBLAS Checking for gkylzero : Found gkylzero 'configure' finished successfully (1.534s)

As reported in #176 , the issue seems to be in luaJIT, though I was wondering if there was a workaround.

rwhchan commented 4 months ago

We have experienced a similar issue and will be keen to see if there is a workaround as well.

cmcdevitt2 commented 4 months ago

Our temporary workaround has been to checkout an older version of the code with:

git checkout -b pre-g0 –-track origin/pre-g0,

after which we were able to successfully build and run the code without a segmentation fault.

A small issue that arises when using this older version of the code is that the link to adios-1.13.1 (needed by this version of Gkeyll) appears to no longer be active. This library can be downloaded directly from https://github.com/ornladios/ADIOS. The build script build-adios.sh will then need to be modified slightly such that it does not attempt to download the library. The two lines that need to be commented out are:

rm -rf adios-1.13.1.tar* adios-1.13.1

curl -L http://users.nccs.gov/~pnorbert/adios-1.13.1.tar.gz > adios-1.13.1.tar.gz