ammarhakim / gkyl

This is the main source repo for the Gkeyll 2.0 code. Please see gkeyll.rtfd.io for details.
https://gkeyll.readthedocs.io/en/latest/
56 stars 15 forks source link

Segmentation fault in luajit on M1 Mac #176

Closed johnbcoughlin closed 6 months ago

johnbcoughlin commented 6 months ago

I successfully configured and built gkyl on my mac, and got a segfault on my first test run:

jack ~/src/gkyl [main] $ ~/gkylsoft/gkyl/bin/gkyl examples/vm-damp.lua
Thu Mar 28 2024 15:04:59.000000000
Gkyl built with 1865a12d679e
Gkyl built on Mar 28 2024 14:58:32
Initializing Vlasov-Maxwell simulation ...
[Jacks-MBP-8:27642] *** Process received signal ***
[Jacks-MBP-8:27642] Signal: Segmentation fault: 11 (11)
[Jacks-MBP-8:27642] Signal code: Invalid permissions (2)
[Jacks-MBP-8:27642] Failing at address: 0xfffffffffc85debc
[Jacks-MBP-8:27642] [ 0] 0   libsystem_platform.dylib            0x000000018b28aa24 _sigtramp + 56
[Jacks-MBP-8:27642] [ 1] 0   libluajit-5.1.2.1.1710088188.dylib  0x0000000102fb9248 lua_dump + 226324
[Jacks-MBP-8:27642] [ 2] 0   libluajit-5.1.2.1.1710088188.dylib  0x0000000102f76520 lua_pcall + 148
[Jacks-MBP-8:27642] [ 3] 0   gkyl                                0x0000000102d1ca0c main + 828
[Jacks-MBP-8:27642] [ 4] 0   dyld                                0x000000018af03f28 start + 2236
[Jacks-MBP-8:27642] *** End of error message ***
zsh: segmentation fault  ~/gkylsoft/gkyl/bin/gkyl examples/vm-damp.lua

This fails to reproduce maybe 1 in every 8 attempts, in which case the simulation runs to completion, and a segfault happens (I guess) once the lua interpreter takes control of execution again.

Here is the output of machines/configure.macos.sh:

jack ~/src/gkyl [main] $ machines/configure.macos.sh
./waf CC=clang CXX=clang++ MPICC=/Users/jack/gkylsoft/openmpi/bin/mpicc MPICXX=/Users/jack/gkylsoft/openmpi/bin/mpicxx --out=build -p /Users/jack/gkylsoft --prefix=/Users/jack/gkylsoft/gkyl --cxxflags=-O3,-std=c++17 --luajit-inc-dir=/Users/jack/gkylsoft/luajit/include/luajit-2.1 --luajit-lib-dir=/Users/jack/gkylsoft/luajit/lib --luajit-share-dir=/Users/jack/gkylsoft/luajit/share/luajit-2.1.0-beta3 --enable-mpi --mpi-inc-dir=/Users/jack/gkylsoft/openmpi/include --mpi-lib-dir=/Users/jack/gkylsoft/openmpi/lib --mpi-link-libs=mpi --enable-adios --adios-inc-dir=/Users/jack/gkylsoft/adios2/include --adios-lib-dir=/Users/jack/gkylsoft/adios2/lib --adios-link-libs=adios2_c_mpi --enable-gkylzero --gkylzero-inc-dir=/Users/jack/gkylsoft/gkylzero/include --gkylzero-lib-dir=/Users/jack/gkylsoft/gkylzero/lib --enable-superlu --superlu-inc-dir=/Users/jack/gkylsoft/superlu/include --superlu-lib-dir=/Users/jack/gkylsoft/superlu/lib --enable-openblas --openblas-inc-dir=/Users/jack/gkylsoft/OpenBLAS/include --openblas-lib-dir=/Users/jack/gkylsoft/OpenBLAS/lib configure
Setting top to                           : /Users/jack/src/gkyl
Setting out to                           : /Users/jack/src/gkyl/build
Checking for 'clang' (C compiler)        : clang
Checking for 'clang++' (C++ compiler)    : clang++
Setting dependency path:                 : /Users/jack/gkylsoft
Setting prefix:                          : /Users/jack/gkylsoft/gkyl
Checking for LUAJIT                      : Found LuaJIT
Checking for MPI                         : Found MPI
Checking for ADIOS                       : Found ADIOS
Checking for Sqlite3                     : Using Sqlite3
Checking for SUPERLU                     : Found SUPERLU
Checking for OPENBLAS                    : Found OPENBLAS
Checking for gkylzero                    : Found gkylzero
'configure' finished successfully (0.841s)
johnbcoughlin commented 6 months ago

I was able to build luajit with debug symbols and get a backtrace, but it seems like it is solidly in the luajit internals. I will see if I can follow up on their mailing list.

cmcdevitt2 commented 5 months ago

I am seeing similar behavior on our local linux cluster. When running the test simulation it either seg. faults immediately, or runs to completion, and then seg. faults before closing. I was wonder if a solution or work around has been identified?

rwhchan commented 4 months ago

We have experienced a similar issue and will be keen to see if there is a workaround as well. We haven't seen a resolution on the LuaJIT repo.