Closed vlast3k closed 1 year ago
@vlast3k I see the point that there might be a delay for the route to be deleted, since the actual LRP needs to be removed and then it will take converger to update BBS to remove actual LRP and emit event for router.
But I think it should be consistent with how we remove the desired LRP when we stop the app. In that case, we don't remove actual LRP, we only emit event. The actual LRP removal from BBS database should be triggered by rep when actual LRP is removed on the cell. Because BBS should rely on data from Rep for the actual state. BBS will try to converge to the desired state if deletion of actual LRP fails for example. Also rep will keep sending StartActualLRP requests if it has the LRP still running, so this removal from db will be unnecessary.
So instead of removeLRP call in the case of Claimed and Running, we should only emit an event to remove actual LRP for the router.
Hi @mariash thank you for the prompt feedback! i will change it and update the PR
Closing the issue since PR is merged
Summary
cf restart-app-instance <app> <index>
ends up in callingbbs/controllers/actual_lrp_lifecycle_controller.go/RetireActualLRP
- here. It the LRP is Claimed and running, it will not emit events.As a consequence
gorouter
does not remove the route to the instance being restarted, and it could get requests, despite it being ingraceful shutdown mode
The expected outcome of
cf restart-app-instace
is that:gorouter
Steps to Reproduce
Described here https://github.com/vlast3k/dontdie/tree/main
Diego repo
https://github.com/cloudfoundry/bbs/tree/main
Environment Details
diego-release - 2.80.0
Possible Causes or Fixes (optional)
The
removeLRP
method is only called if the LRP is Unclaimed or Crashed, or in case of errors - hereThe change in this draft PR fixes the issue (w/o breaking existing tests) https://github.com/cloudfoundry/bbs/pull/72