cloudwu / skynet

A lightweight online game framework
MIT License
13.28k stars 4.19k forks source link

mongo在auth过程中触发gc会导致崩溃 #586

Closed jxfzlmb closed 7 years ago

jxfzlmb commented 7 years ago

我在收到客户端消息时调用了mongo.client,结果有一定概率崩溃,经过大量测试,得到以下结果: 1、不使用mongo的密码验证就不会崩溃 2、gdb里看到的栈是在加密算法某处触发GC崩溃 3、在不同地方调用,或者前后添删一些其他代码,会显著改变崩溃概率。 4、在前面加一句collectgarbage("stop"),就不再会崩溃了

下面是gdb看到的栈:

0 0x000000000041bfdb in luaS_remove ()

1 0x00000000004167a3 in sweeplist ()

2 0x000000000041682a in sweepstep ()

3 0x0000000000416f2e in luaC_step ()

4 0x0000000000410e57 in lua_pushlstring ()

5 0x000000000042277c in luaL_pushresult ()

6 0x00007fffeee24096 in lxor_str (L=0x7fffef8c3fe8) at lualib-src/lua-crypt.c:896

7 0x0000000000414019 in luaD_precall ()

8 0x000000000041f84e in luaV_execute ()

9 0x0000000000413da0 in unroll ()

10 0x000000000041374c in luaD_rawrunprotected ()

11 0x00000000004143df in lua_resume ()

12 0x0000000000427a67 in auxresume ()

13 0x0000000000427d97 in luaB_coresume ()

14 0x0000000000414019 in luaD_precall ()

15 0x000000000041f84e in luaV_execute ()

16 0x00000000004142ef in luaD_call ()

17 0x0000000000414341 in luaD_callnoyield ()

18 0x000000000041374c in luaD_rawrunprotected ()

19 0x000000000041460d in luaD_pcall ()

20 0x0000000000411a3c in lua_pcallk ()

21 0x0000000000426c80 in luaB_pcall ()

22 0x0000000000414019 in luaD_precall ()

23 0x000000000041f84e in luaV_execute ()

24 0x00000000004142ef in luaD_call ()

25 0x0000000000414341 in luaD_callnoyield ()

26 0x000000000041374c in luaD_rawrunprotected ()

27 0x000000000041460d in luaD_pcall ()

28 0x0000000000411a3c in lua_pcallk ()

29 0x00007ffff20d2eb9 in _cb (context=0x7fffefa28680, ud=0x7fffef846e08, type=6, session=0, source=0,

msg=0x7fffef9cea60, sz=24) at lualib-src/lua-skynet.c:50

30 0x000000000040a045 in dispatch_message (ctx=ctx@entry=0x7fffefa28680, msg=msg@entry=0x7ffff06c9c80)

at skynet-src/skynet_server.c:274

31 0x000000000040ac20 in skynet_context_message_dispatch (sm=sm@entry=0x7ffff6cee880, q=q@entry=0x7ffff6e4cbc0,

weight=weight@entry=0) at skynet-src/skynet_server.c:334

32 0x000000000040b34d in thread_worker (p=) at skynet-src/skynet_start.c:162

33 0x00007ffff7bc4184 in start_thread (arg=0x7ffff06ca700) at pthread_create.c:312

34 0x00007ffff71df37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

cloudwu commented 7 years ago

这个 issue 为什么关掉了?

  1. 是目前 skynet master 分支么?还是做了别的修改或使用了比较老的版本?
  2. 是否每次都在 luaS_remove () 中出错?
  3. 如果是在 luaS_remove 出错,似乎和共享字符串有关。可以考虑换成原版 lua 再做一次测试,确定是否这个问题。
  4. 我刚才 review 了一遍共享 proto 和字符串的相关代码,没有发现问题。针对 lua vm 共享字符串的修改,这里有一份参考:http://blog.codingnow.com/2015/08/lua_vm_share_string.html
jxfzlmb commented 7 years ago

抱歉,这个最后发现是我自己的问题,有一个地方skynet.tostring写错成了netpack.tostring导致的,因为在gc的时候触发崩溃,所以被误导了。