[BUG]: vsomeip slow to restart with lots of EventGroup

vSomeip Version

v3.4.10

Boost Version

1.82

Environment

Android and QNX

Describe the bug

My automotive system has *.fidl with ~3500 attributes, one per CAN signal. My *.fdepl maps each attribute into a unique EventGroup.

Especially when resuming from suspend-to-ram it's possible that UDP SOMEIP-SD will be operational but TCP socket will be broken. This leads to tce restart() but during this time any Subscribe will receive SubscribeNack in response:

4191    105.781314  10.6.0.10   10.6.0.3    SOME/IP-SD  1408    SOME/IP Service Discovery Protocol [Subscribe]
4192    105.790868  10.6.0.3    10.6.0.10   SOME/IP-SD  1396    SOME/IP Service Discovery Protocol [SubscribeNack]
4193    105.792094  10.6.0.10   10.6.0.3    SOME/IP-SD  1410    SOME/IP Service Discovery Protocol [Subscribe]
4194    105.801525  10.6.0.10   10.6.0.3    SOME/IP-SD  1410    SOME/IP Service Discovery Protocol [Subscribe]
4195    105.802118  10.6.0.3    10.6.0.10   SOME/IP-SD  1398    SOME/IP Service Discovery Protocol [SubscribeNack]
4196    105.819610  10.6.0.3    10.6.0.10   SOME/IP-SD  1398    SOME/IP Service Discovery Protocol [SubscribeNack]

as the number of EventGroup scales to a large number, this become catastrophic to performance.

In service_discovery_impl::handle_eventgroup_subscription_nack() each EventGroup calls restart(): https://github.com/COVESA/vsomeip/blob/cf497232adf84f55947f7a24e1b64e04b49f1f38/implementation/service_discovery/src/service_discovery_impl.cpp#L2517-L2521

and in tcp_client_endpoint_impl::restart() while ::CONNECTING the code will "early terminate" from maximum 5 restarts: https://github.com/COVESA/vsomeip/blob/cf497232adf84f55947f7a24e1b64e04b49f1f38/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L77-L85

thereafter the code will fall through, calling shutdown_and_close_socket_unlocked() and perform the full restart even while a connection is in progress.

As the system continues processing 1000s of SubscribeNack this will be a tight loop of 100% cpu load and multiple seconds to plow-through the workload. This can easily exceed a 2s ServiceDiscovery interval and cascade to further problems.

Reproduction Steps

My reproduction was:

start with fully-established communication between tse and tce
tce enters suspend-to-ram with TCP socket established
allow tse to continue running, exceed TCP keepalive timeout, and close the TCP socket
tce resumes from suspend-to-ram thinking TCP socket is still established, then discovers it to be closed

but any use-case where tse closes the TCP socket but UDP is functional should be sufficient.

Expected behaviour

Performance should be better.

Logs and Screenshots

No response

COVESA / vsomeip