eclipse / paho.mqtt-sn.embedded-c

Paho C MQTT-SN gateway and libraries for embedded systems. Paho is an Eclipse IoT project.
https://eclipse.org/paho
Other
315 stars 178 forks source link

Sometimes it core dumped #173

Closed jhx2002 closed 3 years ago

jhx2002 commented 4 years ago

Sometimes it core dumped:

20191209 105956.994   PUBLISH     C237  <---  Unknown Client !                    11 0C 00 00 05 C2 37 30 2C 30 2C 32 33 32 37 34 30
Error: Client(fe80::4a16:2bff:fe13:1b53:1000) is not connecting. message has been discarded.
terminate called without an active exception
Aborted (core dumped)
ty4tw commented 4 years ago

Hi,

It looks like a serious problem. Can i get more precise conditions?

jhx2002 commented 4 years ago

Hi,

It looks like a serious problem. Can i get more precise conditions?

It's difficult and long time to reproduce the issue. But I'll try

jhx2002 commented 4 years ago

I found it core dump after some client drop. ipversion: UDP6

martinkirsche commented 4 years ago

I get occasional core dumps as well, even without using UDP v6.

Is there anything we can do to provide you with usable debugging information?

j3po commented 4 years ago

Same here. gateway crashes regularly after short amount of time.

20200805 145452.534 CONNACK ---> PahoGateway-01QoS-1 20200805 150952.000 ADVERTISE ---> Clients terminate called without an active exception Aborted (core dumped)

Would like to provide more information... How?

osaether commented 3 years ago

I am experiencing this problem as well but it is not often. It can run for weeks before it happens.

osaether commented 3 years ago

I am experiencing this problem as well but it is not often. It can run for weeks before it happens.

It happened again yesterday. Less than 24hours since the previous time.

osaether commented 3 years ago

Cored dumped again today and I got this error: Error: BrokerRecvTask can't receive a packetterminate called without an active exception

ty4tw commented 3 years ago

Hi Ole, Thank you for your conyribution.

Could you attach the coredump file, log file and OS name, version also. Those help me to understand what happend.

2020年10月17日(土) 6:13 Ole Sæther notifications@github.com:

Cored dumped again today and I got this error: Error: BrokerRecvTask can't receive a packetterminate called without an active exception

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/paho.mqtt-sn.embedded-c/issues/173#issuecomment-710642812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3X2BEZ2SVFC3SVGUSQ56LSLCZP5ANCNFSM4JYCNQQQ .

osaether commented 3 years ago

@ty4tw I am running MQTTSNGateway on Linux in Docker. I could not find the core dump. Here is the Dockerfile I used to build the container: https://github.com/osaether/otbr-mqtt-sn/blob/master/Dockerfile

ty4tw commented 3 years ago

I have a lack of knowledge about Docker. I can't test it.

ty4tw commented 3 years ago

Hi, The easiest way to debug is to use the Eclipse CDT IDE. Import the project from your local repository into the IDE, then build and debug.

Tomoaki YAMAGUCHI

2020年10月21日(水) 3:02 Ole Sæther notifications@github.com:

I built MQTTSNGateway two days ago from master and today it crashed again. Is there a way to get debug logs?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse/paho.mqtt-sn.embedded-c/issues/173#issuecomment-713039412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3X2BALZRD3IGBAMV4ON2TSLXGEXANCNFSM4JYCNQQQ .

osaether commented 3 years ago

@ty4tw I reduced the length of my MQTT topics and it have not crashed after three days. Is there a limitation on the length of the topics?

eddie007bkk commented 3 years ago

I sent to gateway every 5 seconds and crash within one hour as below details. 20201111 095148.918 CONNECT <--- 0 07 04 04 01 00 0A 30 20201111 095148.920 CONNECT ===> 0 10 0D 00 04 4D 51 54 54 04 02 00 0A 00 01 30 20201111 095149.190 CONNACK <=== 0 20 02 00 00 20201111 095149.190 CONNACK ---> 0 03 05 00

20201111 095150.663 REGISTER 0001 <--- 0 12 0A 00 00 00 01 6D 79 74 6F 70 69 63 2F 74 65 73 74 20201111 095150.663 REGACK 0001 ---> 0 07 0B 00 3D 00 01 00

20201111 095156.054 PUBLISH <--- 0 0A 0C 00 00 3D 00 00 41 42 43 20201111 095156.054 DISCONNECT <--- 0 02 18 20201111 095156.054 DISCONNECT ---> 0 02 18 20201111 095156.054 PUBLISH ===> 0 30 11 00 0C 6D 79 74 6F 70 69 63 2F 74 65 73 74 41 42 43 20201111 095156.054 DISCONNECT ===> 0 E0 00

20201111 095156.104 CONNECT <--- 0 07 04 04 01 00 0A 30 20201111 095156.106 CONNECT ===> 0 10 0D 00 04 4D 51 54 54 04 02 00 0A 00 01 30 20201111 095156.193 CONNACK <=== 0 20 02 00 00 20201111 095156.194 CONNACK ---> 0 03 05 00

20201111 095159.754 REGISTER 0001 <--- 0 12 0A 00 00 00 01 6D 79 74 6F 70 69 63 2F 74 65 73 74 20201111 095159.754 REGACK 0001 ---> 0 07 0B 00 3E 00 01 00

20201111 095202.954 PUBLISH <--- 0 0A 0C 00 00 3E 00 00 41 42 43 20201111 095202.954 DISCONNECT <--- 0 02 18 20201111 095202.954 DISCONNECT ---> 0 02 18 20201111 095202.955 PUBLISH ===> 0 30 11 00 0C 6D 79 74 6F 70 69 63 2F 74 65 73 74 41 42 43 terminate called without an active exception Aborted (core dumped) gdb ./MQTT-SNGateway core GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./MQTT-SNGateway...done. [New LWP 24161] [New LWP 24156] [New LWP 24157] [New LWP 24159] [New LWP 24158] [New LWP 24160] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./MQTT-SNGateway -f gateway-iot.conf'. Program terminated with signal SIGABRT, Aborted.

0 0x00007f717ca81f47 in __nptl_clear_internal_signals (set=0x7f717a49bb60) at ../sysdeps/unix/sysv/linux/nptl-signals.h:49

49 ../sysdeps/unix/sysv/linux/nptl-signals.h: No such file or directory. [Current thread is 1 (Thread 0x7f717a49c700 (LWP 24161))] (gdb) bt full

0 0x00007f717ca81f47 in __nptl_clear_internal_signals (set=0x7f717a49bb60) at ../sysdeps/unix/sysv/linux/nptl-signals.h:49

mask = word = mask = word =

1 __libc_signal_block_app (set=0x7f717a49bb60) at ../sysdeps/unix/sysv/linux/nptl-signals.h:69

allset = {val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} allset = resultvar = arg4 = arg3 = arg2 = __arg1 = _a4 = _a3 = _a2 = _a1 =

2 __GI_raise (sig=0) at ../sysdeps/unix/sysv/linux/raise.c:40

set = {__val = {0, 7310575053241733933, 0 <repeats 14 times>}} pid = tid = ret =

3 0x00007f717ca838b1 in __GI_abort () at abort.c:104

act = {sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {val = {0 <repeats 15 times>, 140125254194880}}, sa_flags = 1946162368, sa_restorer = 0x7f7174001660} sigs = {val = {32, 0 <repeats 15 times>}} cnt = set = cnt = __set =

4 0x00007f717d0d8957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

No symbol table info available. ---Type to continue, or q to quit---

5 0x00007f717d0deae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

No symbol table info available.

6 0x00007f717d0deb21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

No symbol table info available.

7 0x00007f717d0deda9 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

No symbol table info available.

8 0x0000555937c94d0e in MQTTSNGW::Mutex::lock (this=this@entry=0x7f71740016c0) at src/linux/Threading.cpp:134

errmsg =

9 0x0000555937c947b3 in Network::close (this=0x7f7174001660) at src/linux/Network.cpp:596

No locals.

10 0x0000555937c7ff20 in MQTTSNGW::BrokerSendTask::run (this=0x555937eb60c0 ) at src/MQTTSNGWBrokerSendTask.cpp:126

ev = 0x7f7174001240 packet = 0x7f7174000ef0 client = 0x7f71740014c0 adpMgr = 0x555938a49ed0 rc =

11 0x0000555937c7ff69 in MQTTSNGW::BrokerSendTask::EXECRUN (this=0x555937eb60c0 ) at src/MQTTSNGWBrokerSendTask.h:32

No locals.

12 0x0000555937c94936 in MQTTSNGW::Thread::_run (runnable=) at src/linux/Threading.cpp:517

No locals.

13 0x00007f717db346db in start_thread (arg=0x7f717a49c700) at pthread_create.c:463

pd = 0x7f717a49c700 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140125359687424, 4492610701637181942, 140125359685568, 0, 93841678622912, 140730574605008, -4559330357841501706, -4559315421333298698}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call =

14 0x00007f717cb64a3f in epoll_wait (epfd=2051655424, events=0x0, maxevents=2051653568, timeout=0) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

arg4 = 0 arg2 = 0 _a3 = 0 _a1 = 2 ---Type to continue, or q to quit--- resultvar = arg3 = 2051653568 arg1 = 2051655424 _a4 = 8 _a2 = 140125359684448 resultvar = sc_cancel_oldtype = 0 sc_ret =

15 0x0000000000000000 in ?? ()

No symbol table info available.

osaether commented 3 years ago

@ty4tw Any news on this issue?

osaether commented 3 years ago

I managed to backtrace with gdb today and basically got the same result as @eddie007bkk above:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 140391451674800, 0, 0, 121, 140391467612163, 895, 
            0, 0, 281470681751424, 0, 0, 0, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007faf6e607921 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
          sa_mask = {__val = {0, 0, 0, 0, 0, 0, 45, 45, 140391451670144, 1, 
              140391455179248, 140391451652768, 140391448078935, 0, 
              140391340839968, 140391340840064}}, sa_flags = 1744835024, 
          sa_restorer = 0x7faf68001420}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimized out>
        __set = <optimized out>
        __cnt = <optimized out>
        __set = <optimized out>
#2  0x00007faf6ec5c957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#3  0x00007faf6ec62ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#4  0x00007faf6ec62b21 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#5  0x00007faf6ec62da9 in __cxa_rethrow ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#6  0x00007faf6fb1c14e in MQTTSNGW::Mutex::lock() ()
No symbol table info available.
#7  0x00007faf6fb1bbf3 in Network::close() ()
No symbol table info available.
#8  0x00007faf6fb061a0 in MQTTSNGW::BrokerSendTask::run() ()
No symbol table info available.
#9  0x00007faf6fb06219 in MQTTSNGW::BrokerSendTask::EXECRUN() ()
No symbol table info available.
#10 0x00007faf6fb1bd76 in MQTTSNGW::Thread::_run(void*) ()
No symbol table info available.
#11 0x00007faf6f6b86db in start_thread (arg=0x7faf6c821700)
    at pthread_create.c:463
        pd = 0x7faf6c821700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140391416469248, 
                2441522831986355065, 140391416467392, 0, 140391472156864, 
                140733566021280, -2396811033247388807, -2396804766600959111}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#12 0x00007faf6e6e871f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
osaether commented 3 years ago

I rebuilt the program with debugging enabled and got a better gdb trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 140186325858480, 0, 0, 216, 140186341795843, 895, 
            0, 0, 281470681751424, 0, 0, 0, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f7fabece921 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
          sa_mask = {__val = {0, 0, 0, 0, 0, 0, 45, 45, 140186325853824, 1, 
              140186329362928, 140186325836448, 140186322262615, 0, 
              140186189047312, 140186282257904}}, sa_flags = -1450283072, 
          sa_restorer = 0x7f7fa4002610}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimized out>
        __set = <optimized out>
        __cnt = <optimized out>
        __set = <optimized out>
#2  0x00007f7fac523957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#3  0x00007f7fac529ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#4  0x00007f7fac529b21 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#5  0x00007f7fac529da9 in __cxa_rethrow ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#6  0x00007f7fad3e60ec in MQTTSNGW::Mutex::lock (this=0x7f7fa40020c0)
    at src/linux/Threading.cpp:134
No locals.
#7  0x00007f7fad3e477c in Network::connect (this=0x7f7fa4002060, 
    host=0x7f7faf073a90 "mqtt.eclipse.org", port=0x7f7faf073ab0 "1883")
    at src/linux/Network.cpp:258
        rc = false
#8  0x00007f7fad3cf07e in MQTTSNGW::BrokerSendTask::run (
    this=0x7f7fad604760 <task5>) at src/MQTTSNGWBrokerSendTask.cpp:102
        ev = 0x7f7fa4002910
        packet = 0x7f7fa4002a90
        client = 0x7f7fa4001e30
        adpMgr = 0x7f7faf072ed0
        rc = 30
#9  0x00007f7fad3cf6b1 in MQTTSNGW::BrokerSendTask::EXECRUN (
    this=0x7f7fad604760 <task5>) at src/MQTTSNGWBrokerSendTask.h:32
No locals.
#10 0x00007f7fad3e797f in MQTTSNGW::Thread::_run (
    runnable=0x7f7fad604760 <task5>) at src/linux/Threading.cpp:518
No locals.
#11 0x00007f7facf7f6db in start_thread (arg=0x7f7fa98e7700)
    at pthread_create.c:463
        pd = 0x7f7fa98e7700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140186282260224, 
                8354366930278364031, 140186282258368, 0, 140186346342240, 
                140725249619984, -8282120361453866113, -8282131146330393729}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#12 0x00007f7fabfaf71f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

As you can see the termination starts in Threading.cpp:134. It's the empty throw in this function:

void Mutex::lock(void)
{
    if (_pmutex)
    {
        pthread_mutex_lock(_pmutex);
    }
    else
    {
        try
        {
            if (pthread_mutex_lock(&_mutex))
            {
                throw;
            }
        } catch (char* errmsg)
        {
            throw Exception( -1, "The same thread can't aquire a mutex twice.");
        }
    }
}

If no exception is presently being handled, executing a throw expression with no operand calls terminate().

a1lu commented 3 years ago

As osaether states

If no exception is presently being handled, executing a throw expression with no operand calls terminate().

The correct code should be

void Mutex::lock(void)
{
    if (_pmutex)
    {
        pthread_mutex_lock(_pmutex);
    }
    else
    {
        if (pthread_mutex_lock(&_mutex))
        {
            throw Exception( -1, "The same thread can't aquire a mutex twice.");
        }
    }
}

The same applies to void Mutex::unlock(void). And maybe to pthread_mutex_lock(_pmutex); too?

@osaether Do you have a reliable way to reproduce this issue? Apart from this bug (that only made it hard to debug) this looks like it is triggered by some threading issues.

ty4tw commented 3 years ago

Hi, I changed the exception handling scream and mutex exception message. https://github.com/eclipse/paho.mqtt-sn.embedded-c/commit/dd13618845d0a226a1f294675b6466493f91026e Let’s see for a while.

ty4tw commented 3 years ago

I will upload a explanation about exception handling scream soon.