Open joeyoravec opened 2 weeks ago
We came up with 3 possible solutions;
restart()
call from service_discovery_impl::handle_eventgroup_subscription_nack()
. It's not clear why this is required or how it would helprestart()
to "early terminate" better, perhaps an unlimited number of times within the 5 second timeoutInterested in feedback on what would be most effective
vSomeip Version
v3.4.10
Boost Version
1.82
Environment
Android and QNX
Describe the bug
My automotive system has
*.fidl
with ~3500 attributes, one per CAN signal. My*.fdepl
maps each attribute into a unique EventGroup.Especially when resuming from suspend-to-ram it's possible that UDP SOMEIP-SD will be operational but TCP socket will be broken. This leads to tce
restart()
but during this time any Subscribe will receive SubscribeNack in response:as the number of EventGroup scales to a large number, this become catastrophic to performance.
In
service_discovery_impl::handle_eventgroup_subscription_nack()
each EventGroup callsrestart()
: https://github.com/COVESA/vsomeip/blob/cf497232adf84f55947f7a24e1b64e04b49f1f38/implementation/service_discovery/src/service_discovery_impl.cpp#L2517-L2521and in
tcp_client_endpoint_impl::restart()
while::CONNECTING
the code will "early terminate" from maximum 5 restarts: https://github.com/COVESA/vsomeip/blob/cf497232adf84f55947f7a24e1b64e04b49f1f38/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L77-L85thereafter the code will fall through, calling
shutdown_and_close_socket_unlocked()
and perform the full restart even while a connection is in progress.As the system continues processing 1000s of SubscribeNack this will be a tight loop of 100% cpu load and multiple seconds to plow-through the workload. This can easily exceed a 2s ServiceDiscovery interval and cascade to further problems.
Reproduction Steps
My reproduction was:
but any use-case where tse closes the TCP socket but UDP is functional should be sufficient.
Expected behaviour
Performance should be better.
Logs and Screenshots
No response