MRPT / mrpt

:zap: The Mobile Robot Programming Toolkit (MRPT)
https://docs.mrpt.org/reference/latest/
BSD 3-Clause "New" or "Revised" License
1.93k stars 630 forks source link

Crash in CClientTCPSocket when m_hSock >= 1024 #1157

Closed jolting closed 3 years ago

jolting commented 3 years ago

Crashed observed:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f65c0a4492e in __GI_abort () at abort.c:100
#2  0x00007f65c0aaf3ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f65c0bd907c "*** %s ***: terminated\n")
    at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f65c0b51b4a in __GI___fortify_fail (msg=msg@entry=0x7f65c0bd9012 "buffer overflow detected") at fortify_fail.c:26
#4  0x00007f65c0b503e6 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007f65c0b51a8b in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:25
#6  0x00007f65bea0ef74 in mrpt::comms::CClientTCPSocket::connect (this=this@entry=0x7f659800b100, remotePartAddress="192.168.0.10",
    remotePartTCPPort=<optimized out>, timeout_ms=timeout_ms@entry=0) at /mrpt/libs/comms/src/CClientTCPSocket.cpp:212
#7  0x00007f65c22a9f4c in mrpt::hwdrivers::CHokuyoURG::ensureStreamIsOpen (this=0x55dbbde793c0)
    at /mrpt/libs/hwdrivers/src/CHokuyoURG.cpp:1009

Based on this stackoverflow the suggestion is to use epoll instead. https://stackoverflow.com/questions/41212302/c-debugging-terminated-with-sigabrt

during crash m_hSock was observed to be 1068

#6  0x00007f65bea0ef74 in mrpt::comms::CClientTCPSocket::connect (this=this@entry=0x7f659800b100, remotePartAddress="192.168.0.10",
    remotePartTCPPort=<optimized out>, timeout_ms=timeout_ms@entry=0) at /root/rp9-workspace/mrpt/libs/comms/src/CClientTCPSocket.cpp:212
212             FD_SET(m_hSock, &ss_write);
(gdb) print m_hSock
$8 = 1068
jlblancoc commented 3 years ago

Patch in #1158 is verified to run with a live Hokuyo and socket unit tests pass, so comms seems ok with the new API. Thanks for pointing this out! Will merge after CI is happy.

jolting commented 3 years ago

We only ran into the problem after repeatedly restarting the sensors. I'll test develop after the merge since we're currently pulling that into our dev branch.