Open joeyoravec opened 1 month ago
Based on my testing, it looks like vsomeip is vulnerable to a sequence:
No further attempt is made to speak on the old TCP socket so code doesn’t notice it’s closed. To visualize this:
The default TCP keepalive is ~2 hours on Linux (set by /proc/sys/net/ipv4/
) and unknown on QNX. We tested reducing this to 30s and "dead" sockets got cleaned a lot faster once the OS noticed the broken socket. However, even with the smaller keepalive we still reproduced some sockets that stick around “forever”, even after the keepalive interval. Not sure why.
At this point we're thinking about two approaches:
accept_cbk
when a new connection is made.I've opened draft pull request:
with the code-changes that I've applied locally to address this issue. Although sockets still "leak" the TCP keepalive with detect and close within 14 seconds maximum.
vSomeip Version
v3.4.10
Boost Version
1.82
Environment
Android and QNX
Describe the bug
My system has two nodes, QNX and Android, using routingmanagerd and TCP socket. In any situation where the network where the network "goes away and comes back" like unplugging-and-plugging the network cable, the routing manager leaks TCP sockets. After this happens enough times the process will reach an OS limit for maximum descriptors limit and fail or terminate.
This behavior seems to be present in every version I've tested from 3.4.10 back to 3.1.20. I've reproduced on both QNX + Android.
Reproduction Steps
Use an
ifconfig down; sleep 10; ifconfig up
to break and re-establish the network connection. This should be equivalent to many other use-cases where nodes go away: physically plugging-and-unplugging the network cable, suspend-to-ram and resume, etc.Then use
netstat
or any other mechanism to study which sockets and file descriptors are open by the routingmanagerd process.Expected behaviour
Except for transient observations, I expect a single TCP socket (in each direction) from nodeA routingmanager to nodeB routingmanager. If the code is going to detect outages and reconnect it should not leak.
Logs and Screenshots
After doing the
ifconfig down; sleep 10; ifconfig up
enough times netstat will show tons of sockets: