Closed jhx2002 closed 3 years ago
Hi,
It looks like a serious problem. Can i get more precise conditions?
Hi,
It looks like a serious problem. Can i get more precise conditions?
It's difficult and long time to reproduce the issue. But I'll try
I found it core dump after some client drop. ipversion: UDP6
I get occasional core dumps as well, even without using UDP v6.
Is there anything we can do to provide you with usable debugging information?
Same here. gateway crashes regularly after short amount of time.
20200805 145452.534 CONNACK ---> PahoGateway-01QoS-1 20200805 150952.000 ADVERTISE ---> Clients terminate called without an active exception Aborted (core dumped)
Would like to provide more information... How?
I am experiencing this problem as well but it is not often. It can run for weeks before it happens.
I am experiencing this problem as well but it is not often. It can run for weeks before it happens.
It happened again yesterday. Less than 24hours since the previous time.
Cored dumped again today and I got this error: Error: BrokerRecvTask can't receive a packetterminate called without an active exception
Hi Ole, Thank you for your conyribution.
Could you attach the coredump file, log file and OS name, version also. Those help me to understand what happend.
2020年10月17日(土) 6:13 Ole Sæther notifications@github.com:
Cored dumped again today and I got this error: Error: BrokerRecvTask can't receive a packetterminate called without an active exception
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/paho.mqtt-sn.embedded-c/issues/173#issuecomment-710642812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3X2BEZ2SVFC3SVGUSQ56LSLCZP5ANCNFSM4JYCNQQQ .
@ty4tw I am running MQTTSNGateway on Linux in Docker. I could not find the core dump. Here is the Dockerfile I used to build the container: https://github.com/osaether/otbr-mqtt-sn/blob/master/Dockerfile
I have a lack of knowledge about Docker. I can't test it.
Hi, The easiest way to debug is to use the Eclipse CDT IDE. Import the project from your local repository into the IDE, then build and debug.
Tomoaki YAMAGUCHI
2020年10月21日(水) 3:02 Ole Sæther notifications@github.com:
I built MQTTSNGateway two days ago from master and today it crashed again. Is there a way to get debug logs?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse/paho.mqtt-sn.embedded-c/issues/173#issuecomment-713039412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3X2BALZRD3IGBAMV4ON2TSLXGEXANCNFSM4JYCNQQQ .
@ty4tw I reduced the length of my MQTT topics and it have not crashed after three days. Is there a limitation on the length of the topics?
I sent to gateway every 5 seconds and crash within one hour as below details. 20201111 095148.918 CONNECT <--- 0 07 04 04 01 00 0A 30 20201111 095148.920 CONNECT ===> 0 10 0D 00 04 4D 51 54 54 04 02 00 0A 00 01 30 20201111 095149.190 CONNACK <=== 0 20 02 00 00 20201111 095149.190 CONNACK ---> 0 03 05 00
20201111 095150.663 REGISTER 0001 <--- 0 12 0A 00 00 00 01 6D 79 74 6F 70 69 63 2F 74 65 73 74 20201111 095150.663 REGACK 0001 ---> 0 07 0B 00 3D 00 01 00
20201111 095156.054 PUBLISH <--- 0 0A 0C 00 00 3D 00 00 41 42 43 20201111 095156.054 DISCONNECT <--- 0 02 18 20201111 095156.054 DISCONNECT ---> 0 02 18 20201111 095156.054 PUBLISH ===> 0 30 11 00 0C 6D 79 74 6F 70 69 63 2F 74 65 73 74 41 42 43 20201111 095156.054 DISCONNECT ===> 0 E0 00
20201111 095156.104 CONNECT <--- 0 07 04 04 01 00 0A 30 20201111 095156.106 CONNECT ===> 0 10 0D 00 04 4D 51 54 54 04 02 00 0A 00 01 30 20201111 095156.193 CONNACK <=== 0 20 02 00 00 20201111 095156.194 CONNACK ---> 0 03 05 00
20201111 095159.754 REGISTER 0001 <--- 0 12 0A 00 00 00 01 6D 79 74 6F 70 69 63 2F 74 65 73 74 20201111 095159.754 REGACK 0001 ---> 0 07 0B 00 3E 00 01 00
20201111 095202.954 PUBLISH <--- 0 0A 0C 00 00 3E 00 00 41 42 43 20201111 095202.954 DISCONNECT <--- 0 02 18 20201111 095202.954 DISCONNECT ---> 0 02 18 20201111 095202.955 PUBLISH ===> 0 30 11 00 0C 6D 79 74 6F 70 69 63 2F 74 65 73 74 41 42 43 terminate called without an active exception Aborted (core dumped) gdb ./MQTT-SNGateway core GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./MQTT-SNGateway...done. [New LWP 24161] [New LWP 24156] [New LWP 24157] [New LWP 24159] [New LWP 24158] [New LWP 24160] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./MQTT-SNGateway -f gateway-iot.conf'. Program terminated with signal SIGABRT, Aborted.
49 ../sysdeps/unix/sysv/linux/nptl-signals.h: No such file or directory. [Current thread is 1 (Thread 0x7f717a49c700 (LWP 24161))] (gdb) bt full
mask =
allset = {val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
allset =
set = {__val = {0, 7310575053241733933, 0 <repeats 14 times>}}
pid =
act = {sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {val = {0 <repeats 15 times>, 140125254194880}}, sa_flags = 1946162368, sa_restorer = 0x7f7174001660}
sigs = {val = {32, 0 <repeats 15 times>}}
cnt =
No symbol table info available.
---Type
No symbol table info available.
No symbol table info available.
No symbol table info available.
errmsg =
No locals.
ev = 0x7f7174001240
packet = 0x7f7174000ef0
client = 0x7f71740014c0
adpMgr = 0x555938a49ed0
rc =
No locals.
No locals.
pd = 0x7f717a49c700
now =
arg4 = 0
arg2 = 0
_a3 = 0
_a1 = 2
---Type
No symbol table info available.
@ty4tw Any news on this issue?
I managed to backtrace with gdb today and basically got the same result as @eddie007bkk above:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
set = {__val = {0, 140391451674800, 0, 0, 121, 140391467612163, 895,
0, 0, 281470681751424, 0, 0, 0, 0, 0, 0}}
pid = <optimized out>
tid = <optimized out>
ret = <optimized out>
#1 0x00007faf6e607921 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
sa_mask = {__val = {0, 0, 0, 0, 0, 0, 45, 45, 140391451670144, 1,
140391455179248, 140391451652768, 140391448078935, 0,
140391340839968, 140391340840064}}, sa_flags = 1744835024,
sa_restorer = 0x7faf68001420}
sigs = {__val = {32, 0 <repeats 15 times>}}
__cnt = <optimized out>
__set = <optimized out>
__cnt = <optimized out>
__set = <optimized out>
#2 0x00007faf6ec5c957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#3 0x00007faf6ec62ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#4 0x00007faf6ec62b21 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#5 0x00007faf6ec62da9 in __cxa_rethrow ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#6 0x00007faf6fb1c14e in MQTTSNGW::Mutex::lock() ()
No symbol table info available.
#7 0x00007faf6fb1bbf3 in Network::close() ()
No symbol table info available.
#8 0x00007faf6fb061a0 in MQTTSNGW::BrokerSendTask::run() ()
No symbol table info available.
#9 0x00007faf6fb06219 in MQTTSNGW::BrokerSendTask::EXECRUN() ()
No symbol table info available.
#10 0x00007faf6fb1bd76 in MQTTSNGW::Thread::_run(void*) ()
No symbol table info available.
#11 0x00007faf6f6b86db in start_thread (arg=0x7faf6c821700)
at pthread_create.c:463
pd = 0x7faf6c821700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140391416469248,
2441522831986355065, 140391416467392, 0, 140391472156864,
140733566021280, -2396811033247388807, -2396804766600959111},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#12 0x00007faf6e6e871f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
I rebuilt the program with debugging enabled and got a better gdb trace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
set = {__val = {0, 140186325858480, 0, 0, 216, 140186341795843, 895,
0, 0, 281470681751424, 0, 0, 0, 0, 0, 0}}
pid = <optimized out>
tid = <optimized out>
ret = <optimized out>
#1 0x00007f7fabece921 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
sa_mask = {__val = {0, 0, 0, 0, 0, 0, 45, 45, 140186325853824, 1,
140186329362928, 140186325836448, 140186322262615, 0,
140186189047312, 140186282257904}}, sa_flags = -1450283072,
sa_restorer = 0x7f7fa4002610}
sigs = {__val = {32, 0 <repeats 15 times>}}
__cnt = <optimized out>
__set = <optimized out>
__cnt = <optimized out>
__set = <optimized out>
#2 0x00007f7fac523957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#3 0x00007f7fac529ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#4 0x00007f7fac529b21 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#5 0x00007f7fac529da9 in __cxa_rethrow ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#6 0x00007f7fad3e60ec in MQTTSNGW::Mutex::lock (this=0x7f7fa40020c0)
at src/linux/Threading.cpp:134
No locals.
#7 0x00007f7fad3e477c in Network::connect (this=0x7f7fa4002060,
host=0x7f7faf073a90 "mqtt.eclipse.org", port=0x7f7faf073ab0 "1883")
at src/linux/Network.cpp:258
rc = false
#8 0x00007f7fad3cf07e in MQTTSNGW::BrokerSendTask::run (
this=0x7f7fad604760 <task5>) at src/MQTTSNGWBrokerSendTask.cpp:102
ev = 0x7f7fa4002910
packet = 0x7f7fa4002a90
client = 0x7f7fa4001e30
adpMgr = 0x7f7faf072ed0
rc = 30
#9 0x00007f7fad3cf6b1 in MQTTSNGW::BrokerSendTask::EXECRUN (
this=0x7f7fad604760 <task5>) at src/MQTTSNGWBrokerSendTask.h:32
No locals.
#10 0x00007f7fad3e797f in MQTTSNGW::Thread::_run (
runnable=0x7f7fad604760 <task5>) at src/linux/Threading.cpp:518
No locals.
#11 0x00007f7facf7f6db in start_thread (arg=0x7f7fa98e7700)
at pthread_create.c:463
pd = 0x7f7fa98e7700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140186282260224,
8354366930278364031, 140186282258368, 0, 140186346342240,
140725249619984, -8282120361453866113, -8282131146330393729},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#12 0x00007f7fabfaf71f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
As you can see the termination starts in Threading.cpp:134. It's the empty throw in this function:
void Mutex::lock(void)
{
if (_pmutex)
{
pthread_mutex_lock(_pmutex);
}
else
{
try
{
if (pthread_mutex_lock(&_mutex))
{
throw;
}
} catch (char* errmsg)
{
throw Exception( -1, "The same thread can't aquire a mutex twice.");
}
}
}
If no exception is presently being handled, executing a throw expression with no operand calls terminate().
As osaether states
If no exception is presently being handled, executing a throw expression with no operand calls terminate().
The correct code should be
void Mutex::lock(void)
{
if (_pmutex)
{
pthread_mutex_lock(_pmutex);
}
else
{
if (pthread_mutex_lock(&_mutex))
{
throw Exception( -1, "The same thread can't aquire a mutex twice.");
}
}
}
The same applies to void Mutex::unlock(void)
. And maybe to pthread_mutex_lock(_pmutex);
too?
@osaether Do you have a reliable way to reproduce this issue? Apart from this bug (that only made it hard to debug) this looks like it is triggered by some threading issues.
Hi, I changed the exception handling scream and mutex exception message. https://github.com/eclipse/paho.mqtt-sn.embedded-c/commit/dd13618845d0a226a1f294675b6466493f91026e Let’s see for a while.
I will upload a explanation about exception handling scream soon.
Sometimes it core dumped: