badoo / RIBs

Badoo's take on RIBs
Apache License 2.0
162 stars 50 forks source link

Succesive transition bug fix[WIP - test missing] #272

Closed SamuPS closed 3 years ago

SamuPS commented 3 years ago

The original bug in Badoo PQW was caused by two factors. First one, a bug in the PQW feature that was sending and extra configuration operation to the rib. So basically some rogue event was generated when changing config from a certain configuration telling the rib to navigate to the current config again. E.g Current config is B, features ask for replace(A), feature ask for replace(B) inmediately

Second factor, the RIB bug :bug:: The rib bug is caused when calling two routing actions too fast. The issue is that they overlap, and for example, when calling some routing action and calling it transition immediately. This transition overlapping result in both children detached. The source of the bug is in the rib Actor. When starting a transition a handler.post{} is used to start the transition. The reason to do so can be found in the code itself: /**

However, this cause the transition actual start to be enqueued. Thus, if some new routing action was called immediately after the first one, there is not ongoing operation, and then, second transition is going to be started following the same process. And here we have the transition overlap causing the unexpected issue.

The solution: The proposed solution is just to keep track of the transition until it really start. If along that time windows some new transition is requested, the actor react and handle that scenario to keep a consistent behaviour.