COVESA / vsomeip

An implementation of Scalable service-Oriented MiddlewarE over IP
Mozilla Public License 2.0
1.01k stars 647 forks source link

[BUG]: vsomeip slow to establish communication with lots of EventGroup #669

Open joeyoravec opened 1 month ago

joeyoravec commented 1 month ago

vSomeip Version

v3.4.10

Boost Version

1.82

Environment

Android and QNX

Describe the bug

My automotive system has *.fidl with ~3500 attributes, one per CAN signal. My *.fdepl maps each attribute into a unique EventGroup.

Any time the network connection is established, or broken and re-established, I get an avalanche of ~3500 subscribes, followed by ~3500 acknowledgements, transmitted one-per-frame. The entire sequence does not fit inside a 2 seconds Service Discovery interval. When the work does not complete within the timeout interval then routingmanager will issue StopSubscribe and SubscribeNAK. The system will retry but it will take a long time, at least a couple of Service Discovery intervals.

The train logic is supposed to aggregate these together, sending a train only when it’s full or 5 ms elapse, but there are several places in the code that prevent this.

Reproduction Steps

This behavior is easily reproduced when the system has a *.fidl with 1000s of attributes and *.fdepl puts each into a unique EventGroup.

Subscribe to all ~3500 attributes, use an ifconfig down; sleep 10; ifconfig up to break and re-establish the network connection, look at the tcpdump and observe the network behavior.

Expected behaviour

The train logic should do a "pretty good job" to aggregate many SUBSCRIBE and many SUBSCRIBEACK into each Service Discovery packet.

Logs and Screenshots

With the existing code you should see 1000s of back-to-back SUBSCRIBE like:

5039    9.333908    10.6.0.3    10.6.0.10   SOME/IP-SD  86  SOME/IP Service Discovery Protocol [SubscribeNack]
5040    9.334271    10.6.0.10   10.6.0.3    SOME/IP-SD  104 SOME/IP Service Discovery Protocol [Subscribe]
5041    9.335307    10.6.0.10   10.6.0.3    SOME/IP-SD  98  SOME/IP Service Discovery Protocol [Subscribe]
5042    9.335710    10.6.0.10   10.6.0.3    SOME/IP-SD  114 SOME/IP Service Discovery Protocol [Subscribe]
5043    9.336492    10.6.0.10   10.6.0.3    SOME/IP-SD  98  SOME/IP Service Discovery Protocol [Subscribe]
5044    9.336762    10.6.0.10   10.6.0.3    TCP 66  36651 → 30510 [FIN, ACK] Seq=142 Ack=1 Win=64256 Len=0 TSval=269564273 TSecr=2

each of ~98 bytes, separate packets, nothing or almost-nothing aggregated. In this region we see a SUBSCRIBENACK and socket close because the entire sequence exceeded the 2s Service Discovery timeout interval

joeyoravec commented 1 month ago

I've opened draft pull requests:

with the code-changes that I've applied locally to address this issue. I would appreciate any feedback on the approach.