PPC64 / LuaJIT

LuaJIT v2.1 branch port to PPC64 (see ppc64-port branch) - See the Wiki for current status.
Other
10 stars 6 forks source link

PPC64 Luajit segfaults on Alpine #1

Closed gromero closed 1 year ago

gromero commented 7 years ago

Currently code in ppc64-port branch (v2.0.4-607-gbb23a15) segfaults when it runs against a lua-copas test (which in turn uses also lua-socket)

The segfault was first identified on Alpine build system:

http://build.alpinelinux.org/buildlogs/build-edge-ppc64le/community/lua-copas/lua-copas-2.0.2-r1.log

I'm able to reproduce using upstream lua-copas with upstream lua-socket:

lua-copas: git@github.com:keplerproject/copas.git lua-socket: git@github.com:diegonehab/luasocket.git

alpine1:~/git/copas$ LUA_PATH="src/?.lua;/tmp/lua-socket/share/lua/5.1/?.lua;;" LUA_CPATH="/tmp/lua-socket/lib/lua/5.1/?.so;;" /home/gromero/git/LuaJIT/src/luajit tests/largetransfer.lua
starting loop
1   seconds:    1.1291439533234
Segmentation fault

gdb shows two problematic bytecodes BC_RET0 and BC_UGET:

Program received signal SIGSEGV, Segmentation fault.
0x000000002003e56c in lj_BC_RET0 ()
(gdb) bt
#0  0x000000002003e56c in lj_BC_RET0 ()
#1  0x000000002003fbc4 in lj_ff_coroutine_resume ()
#2  0x000000002001c924 in lua_pcall (L=0x3fffb7ed0378, nargs=0, nresults=-1, errfunc=2) at lj_api.c:1129
#3  0x0000000020006184 in docall (L=0x3fffb7ed0378, narg=0, clear=0) at luajit.c:121
#4  0x0000000020006f08 in handle_script (L=0x3fffb7ed0378, argx=0x3ffffffffb20) at luajit.c:291
#5  0x0000000020007ed0 in pmain (L=0x3fffb7ed0378) at luajit.c:551
#6  0x000000002003e9e4 in lj_BC_FUNCC ()
#7  0x000000002001cb78 in lua_cpcall (L=0x3fffb7ed0378, func=0x20007cac <pmain>, ud=0x0) at lj_api.c:1153
#8  0x0000000020008040 in main (argc=2, argv=0x3ffffffffb18) at luajit.c:580
(gdb) disas $pc
Dump of assembler code for function lj_BC_RET0:
   0x000000002003e524 <+0>: ld      r16,-8(r14)
   0x000000002003e528 <+4>: add     r20,r14,r20
   0x000000002003e52c <+8>: mr      r19,r12
   0x000000002003e530 <+12>:    andi.   r0,r16,3
   0x000000002003e534 <+16>:    xori    r8,r16,3
   0x000000002003e538 <+20>:    bne     0x2003e510 <lj_BC_RET+180>
   0x000000002003e53c <+24>:    lwz     r7,-4(r16)
   0x000000002003e540 <+28>:    addi    r9,r14,-16
   0x000000002003e544 <+32>:    rlwinm  r10,r7,11,21,28
   0x000000002003e548 <+36>:    cmpld   r10,r12
   0x000000002003e54c <+40>:    rlwinm  r20,r7,27,21,28
   0x000000002003e550 <+44>:    bgt     0x2003e590 <lj_BC_RET0+108>
   0x000000002003e554 <+48>:    subf    r14,r20,r9
   0x000000002003e558 <+52>:    ld      r8,-16(r14)
   0x000000002003e55c <+56>:    clrldi  r8,r8,17
   0x000000002003e560 <+60>:    lwz     r7,0(r16)
   0x000000002003e564 <+64>:    addi    r16,r16,4
   0x000000002003e568 <+68>:    ld      r8,32(r8)
=> 0x000000002003e56c <+72>:    ld      r15,-72(r8)
   0x000000002003e570 <+76>:    rlwinm  r8,r7,3,21,28
   0x000000002003e574 <+80>:    ldx     r0,r17,r8
   0x000000002003e578 <+84>:    mtctr   r0
   0x000000002003e57c <+88>:    rlwinm  r10,r7,11,21,28
   0x000000002003e580 <+92>:    rlwinm  r12,r7,19,13,28
   0x000000002003e584 <+96>:    rlwinm  r20,r7,27,21,28
   0x000000002003e588 <+100>:   rlwinm  r11,r7,19,21,28
   0x000000002003e58c <+104>:   bctr
   0x000000002003e590 <+108>:   addi    r8,r12,-8
   0x000000002003e594 <+112>:   addi    r12,r12,8
   0x000000002003e598 <+116>:   stdx    r23,r9,r8
   0x000000002003e59c <+120>:   b       0x2003e548 <lj_BC_RET0+36>
End of assembler dump.

and

Program received signal SIGSEGV, Segmentation fault.
0x000000002003d5e8 in lj_BC_UGET ()
(gdb) bt
#0  0x000000002003d5e8 in lj_BC_UGET ()
#1  0x000000002003fbc4 in lj_ff_coroutine_resume ()
#2  0x000000002001c924 in lua_pcall (L=0x3fffb7ed0378, nargs=0, nresults=-1, errfunc=2) at lj_api.c:1129
#3  0x0000000020006184 in docall (L=0x3fffb7ed0378, narg=0, clear=0) at luajit.c:121
#4  0x0000000020006f08 in handle_script (L=0x3fffb7ed0378, argx=0x3ffffffffb20) at luajit.c:291
#5  0x0000000020007ed0 in pmain (L=0x3fffb7ed0378) at luajit.c:551
#6  0x000000002003e9e4 in lj_BC_FUNCC ()
#7  0x000000002001cb78 in lua_cpcall (L=0x3fffb7ed0378, func=0x20007cac <pmain>, ud=0x0) at lj_api.c:1153
#8  0x0000000020008040 in main (argc=2, argv=0x3ffffffffb18) at luajit.c:580
(gdb) disas $pc
Dump of assembler code for function lj_BC_UGET:
   0x000000002003d5d0 <+0>: ld      r10,-16(r14)
   0x000000002003d5d4 <+4>: addi    r12,r12,40
   0x000000002003d5d8 <+8>: clrldi  r10,r10,17
   0x000000002003d5dc <+12>:    ldx     r10,r10,r12
   0x000000002003d5e0 <+16>:    lwz     r7,0(r16)
   0x000000002003d5e4 <+20>:    addi    r16,r16,4
=> 0x000000002003d5e8 <+24>:    ld      r8,32(r10)
   0x000000002003d5ec <+28>:    ld      r8,0(r8)
   0x000000002003d5f0 <+32>:    stdx    r8,r14,r20
   0x000000002003d5f4 <+36>:    rlwinm  r8,r7,3,21,28
   0x000000002003d5f8 <+40>:    ldx     r0,r17,r8
   0x000000002003d5fc <+44>:    mtctr   r0
   0x000000002003d600 <+48>:    rlwinm  r10,r7,11,21,28
   0x000000002003d604 <+52>:    rlwinm  r12,r7,19,13,28
   0x000000002003d608 <+56>:    rlwinm  r20,r7,27,21,28
   0x000000002003d60c <+60>:    rlwinm  r11,r7,19,21,28
   0x000000002003d610 <+64>:    bctr
End of assembler dump.
gut commented 7 years ago

Interesting... the 2 segfaults are related to these lines: https://github.com/PPC64/LuaJIT/blob/ppc64-port/src/vm_ppc64.dasc#L4028 https://github.com/PPC64/LuaJIT/blob/ppc64-port/src/vm_ppc64.dasc#L3123

They don't look related but I'll take a closer look. Thanks for the report.

gromero commented 7 years ago

Thanks gut for taking a look at it. Could that be related to a .so (libsocket.so from lua-socket) when it was linked or loaded improperly?

gut commented 7 years ago

I can't say ATM. I'll try to reproduce and analyse it. Thanks for the hint

gut commented 7 years ago

I could reproduce the problem and it's not directly on those bytecodes:

Amount Frame happening SIGSEGV
0.42% lj_BC_GGET
2.49% lj_BC_TGETS
5.95% gc_traverse_frames
91.14% lj_BC_RET0

(sample size: ~1500)

All of those frames have lj_ff_coroutine_resume as its caller, but as other bytecodes ran before, we can't assume much.

Well, I'll post here once I have some guesses.

gut commented 7 years ago

I reviewed many bytecodes on our LuaJIT port and I didn't find a problem. I started to investigate the socket usage on luasocket and I noticed a threshold where LuaJIT on PPC64le starts to fail. That is sending something above of 2742200 bytes. I develop a sample application in C to see that this kind of failure happens besides LuaJIT implementation.

If you're interested, please check https://github.com/gut/socketxp . I see the same behavior also on X64.

For now, I'm stopping my investigation as it looks like something specific to socket implementation and would probably be fixed on luasocket, not on LuaJIT.