COVESA / vsomeip

An implementation of Scalable service-Oriented MiddlewarE over IP
Mozilla Public License 2.0
1.12k stars 696 forks source link

[BUG]: Missing TP segments corrupt all following TP messages #737

Open siggie0815 opened 3 months ago

siggie0815 commented 3 months ago

vSomeip Version

v3.4.10

Boost Version

any

Environment

All

Describe the bug

When we tested communication with our sensors, we faced problems with bad E2E CRC checks, occasionally. When the problem occurs, it remains until communication is reset. The messages are notifications, segmented using TP and protected with E2E.

My understanding of the problem is the following: If a single TP segment gets lost, the vsomeip tp-reassembler cannot finish this message. So far so good.

However, now next message is received, segment by segment. The old message is still there waiting to be completed. So for the first few segments we might get a duplicate segement error. As soon as the missing segment from the old message is received, the message is regarded as complete and returned. Then the E2E check is being processed. As we have reassembled the message from segments from actually two consecutive messages the CRC check fails and we have garbage data.

From now on, all messages will be reassembled from mixed segments without a duplicate segment error on the log. Hence the CRC will fail for all messages and the data is actually garbage.

e.g.:

Reproduction Steps

It's hard to reproduce. Somehow remove one TP segment from the communication.

Expected behaviour

In my opinion a missing segment should not invalidate all upcoming traffic.

The problem could be resolved by various actions:

  1. Lower the message reassembling timeout to less than the message frequency. So incomplete messages will be deleted before the next message arrives. The timeout is 5 seconds hardcoded at the moment and hence not very helpful.
  2. Force start of a new TP message by segments with offset zero. Remaining incomplete messages will be discarded. Other segments cannot start a new tp message.

I think the first solution is the better one, as it does not introduce as many implications on the order of the segments arriving.

Logs and Screenshots

No response

siggie0815 commented 3 months ago

After revising the SOME/IP TP Spec, I changed my mind: The spec is quite particular about when message reassembly should be interrupted and a new message should start. In other word vsomeip does not obey the specs in this regard.

https://www.autosar.org/fileadmin/standards/R20-11/CP/AUTOSAR_SWS_SOMEIPTransportProtocol.pdf

I will try if I can fix the problems and file a pull request.

duartenfonseca commented 2 weeks ago

hi @siggie0815 could you try and test with this PR: https://github.com/COVESA/vsomeip/pull/783 and see if this fixes the issue. thanks!