PPC64 / LuaJIT

LuaJIT v2.1 branch port to PPC64 (see ppc64-port branch) - See the Wiki for current status.
Other
10 stars 6 forks source link

Segfault on Power9 architecture #3

Open elliottslaughter opened 6 years ago

elliottslaughter commented 6 years ago

Hi,

I'm wondering if anyone here would be interested in helping to debug a segfault that's occurring with this fork on the Power9 architecture. I can't prove that this is LuaJIT's fault, but I'm tearing my hear out trying to figure out what's going wrong. Backtrace for the crash looks like:

Program received signal SIGSEGV, Segmentation fault.
0x0000000012f73600 in lj_BC_CALLT ()
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-28.el7.ppc64le libstdc++-4.8.5-28.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) where
#0  0x0000000012f73600 in lj_BC_CALLT ()
#1  0x0000000012f1b2b4 in lua_pcall (L=0x200000070378, nargs=10, nresults=-1, errfunc=1) at lj_api.c:1129
#2  0x00000000103cdd48 in docall (L=0x200000070378, narg=10, clear=0) at src/main.cpp:332
#3  0x00000000103cceb0 in main (argc=13, argv=0x7fffffffd3c8) at src/main.cpp:109

(In a debugger for clarity, though of course it happens without the debugger too.)

The program below is fully minimized, i.e. removing any part of the program causes it to stop reproducing:

https://github.com/stanfordhpccenter/soleil-x/blob/minimize-ppc64-crash/src/dom.rg

Furthermore, introducing print statements can cause the crash to move or to disappear entirely.

Based on these symptoms, it sure seems like there has to be some sort of memory corruption going on, but because I can't printf-debug, I'm not even sure where the crash is occurring! (Aside from the backtrace above that seems to indicate it's going somewhere into Lua code.)

You'll note that there are two other languages thrown in the mix here: Terra and Regent. Unfortunately I can't remove these, so I just have to debug around them.

I'd be happy to respond with complete instructions for reproducing, if that would be helpful.

Thanks in advance for any help or advice.

elliottslaughter commented 6 years ago

I am now fairly convinced that this is indeed a bug in this fork, vs Terra or Regent or the application. The most convincing evidence is that my code runs successfully with this alternative fork of LuaJIT:

https://github.com/koriakin/LuaJIT (branch ppc64-ffi)

I'll leave this up for now in case anyone has any interest in continuing to work on this fork.