COVESA / vsomeip

An implementation of Scalable service-Oriented MiddlewarE over IP
Mozilla Public License 2.0
1.09k stars 678 forks source link

Possible race condition with selective events #208

Open Almeida-Oco opened 3 years ago

Almeida-Oco commented 3 years ago

Considering the following network architecture with 2 different devices connected to the same network, with the following applications, where the first row represents the central routing managers of each device:

Device 1 Device 2
vsomeipd Application 2
Application 1 Application 3

Application 1 is offering a selective event and Application 2 and 3 will be subscribing to it, so Application 2 will send 2 SOME/IP-SD Subscribe messages to vsomeipd. In their selective options, the first will only contain it's own ID [2], while the second will contain both it's ID and Application 3 ID [2, 3].

When vsomeipd receives the first SUBSCRIBE message, it will create a new remote_subscription whose only client will be App 2 with the PENDING state, lets say it was stored in the 0x1234 memory position, so:

remote_subscription (0x1234) -  clients_{2: PENDING}, parent_: nullptr 

vsomeipd will then proceed to send a VSOMEIP_SUBSCRIBE local message to Application 1, and the PENDING state will only be updated to ACKED once it replies with a VSOMEIP_SUBSCRIBE_ACK message containing client 2. Now, if the second SUBSCRIBE message arrives and is processed before client 2 remote subscription state is changed to ACKED, then in routing_manager_impl::on_remote_subscribe(), in the call to eventgroupinfo::update_remote_subscription(), as both instances are "equal", the newly created remote_subscription will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription:

remote_subscription (0x1234) -  clients_{2: PENDING, 3: PENDING}, parent_: nullptr 
remote_subscription (0x4321) -  clients_{2: PENDING, 3: PENDING}, parent_: 0x4321

Also, since the 0x1234 instance already had client 2, the update_remote_subscription will only insert client 3 in its _changed parameter, meaning that later in the send_subscription call, it will only send a VSOMEIP_SUBSCRIBE message containing client 3 to Application 1. https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2742-L2757

After that, vsomeipd will eventually receive the VSOMEIP_SUBSCRIBE_ACK message with client 2 from Application 1 and will update the 0x1234 remote_subscription state of client 2 to ACKED in the routing_manager_impl::on_subscribe_ack() method, but no SUBSCRIBE_ACK Service Discovery message will be sent as the service_discovery_impl::update_remote_subscription() call will only send the ACK if the remote_subscription is not pending, since client 3 state is still PENDING in the 0x1234 subscription that is not the case. When the next VSOMEIP_SUBSCRIBE_ACK message with client 3 is received, the remote_subscription 0x4321 will be updated, client 3 state will then be ACKED, which is also propagated to it's parent 0x1234. Since 0x4321 client 2 state is still PENDING it will not be removed from the subscriptions_ collection of eventgroupinfo in https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/routing_manager_impl.cpp#L2844-L2862 Consequently, no SUBSCRIBE_ACK message will be sent either, so we end up with the following state of the eventgroups subscriptions_ collection:

remote_subscription (0x1234) -  clients_{2: ACKED, 3: ACKED}, parent_: nullptr 
remote_subscription (0x4321) -  clients_{2: PENDING, 3: ACKED}, parent_: 0x4321

Once this happens, subsequent SUBSCRIBE messages sent by Application 2 will always contain the same clients in its selective option [2, 3] and no change will happen, which seems would be causing it to either enter the !_subscription->is_pending() branch or the else branch in eventgroupinfo::update_remote_subscription(), depending on which existing remote_subscription is iterated first. https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/routing/src/eventgroupinfo.cpp#L211-L250

In the first case, it will inherit the state of 0x1234 and not be pending anymore, so it should send the SUBSCRIBE_ACK back to Application 2 and resume normal behavior(?), whereas in the second case it will be assigned set_answers(0) and not send any SUBSCRIBE_ACK back as it would enter the do_not_answer branch of service_discovery::send_subscription_ack(). https://github.com/GENIVI/vsomeip/blob/13f9c89ced6ffaeb1faf485152e27e1f40d234cd/implementation/service_discovery/src/service_discovery_impl.cpp#L3557-L3593

If in the send_subscription() call we would use something like _subscription.get_pending_clients() instead of using its_added when receiving new clients for a selective event, this issue would disappear at the expense of some additional local messages.

Did I miss something or is my reasoning correct?

lutzbichler commented 3 years ago

First of all thank you for the thorough analysis! I am currently trying to reconstruct the (possible) issue and found one thing I do not understand regarding the analysis. It is within this part:

"the newly created remote_subscription will inherit the acknowledgement states from the existing subscription (0x1234) , so we end up with the following instances of remote_subscription:

remotesubscription (0x1234) - clients{2: PENDING, 3: PENDING}, parent_: nullptr remotesubscription (0x4321) - clients{2: PENDING, 3: PENDING}, parent_: 0x4321"

If I read "eventgroupinfo::update_remote_subscription" correctly, client 2 is not copied to the second remote subscription that contains client 3. Thus we end up with:

remotesubscription (0x1234) - clients{2: PENDING, 3: PENDING}, parent_: nullptr remotesubscription (0x4321) - clients{3: PENDING}, parent_: 0x1234"

instead and the answer to the second subscription is sent as soon as the application acks it.

Almeida-Oco commented 3 years ago

But because the 2nd SUBSCRIBE message that Application 2 sends to vsomeipd has clients [2, 3] in its selective options, won't the new subscription be:

 remote_subscription (0x4321) - clients_{2: PENDING, 3: PENDING}, parent_: nullptr"

before even entering eventgroupinfo::update_remote_subscription?

lutzbichler commented 3 years ago

Yes, you are right. Probably we should let the remote_subscription objects know their children to keep the update in sync.