Closed felixgborrego closed 5 years ago
Hey @felixgborrego thanks for the PR. I have some concerns about this approach and suggest a different one. Here's some general comments. More on the code as well.
The first is that re-using the Tombstone status will make it invisible what is happening when we're triggering this state. The second is that the normal processing loop in Sidecar will override this change within 1 a second or two, because an Alive supercedes a Tombstone and the source of truth is:
Docker API -> healthy (sidecar/healthy package) -> state
As long as DockerDiscovery
is returning the container and it's healthy according to the health checker, it will get re-populated into the state.
What's required here is a new Status, e.g. DRAINING
which is not a Tombstone, and which can be handled as a separate case when updating state. This would also need to be treated like a Tombstone for purposes of gossip messages (they are propagated faster and more widely).
Some further notes:
Should DRAINING
have a lifespan like Alive and Tombstone? Just so you can't get stuck draining forever? If so, something like https://github.com/Nitro/sidecar/blob/master/catalog/services_state.go#L32
As long as the state is not ALIVE
this will prevent it from being exposed to Envoy: https://github.com/Nitro/sidecar/blob/master/sidecarhttp/envoy_api.go#L301
Happy to set up some time to pair on this with you if needed.
@relistan thanks for your feedback, I'll close this PR exploring options and we'll send a PR with your feedback soon
This is an early WIP, sharing here to get some feedback exploring options to implement it.
Issue: Currently, we don't have a way to tell to Sidecar when a container is about to go down before it actually died. This translated into some errors when rolling out services while Sidecar detects the container as unhealthy and spread the new state in the cluster.
Goal: Have a way for the sidecar-executor to notify Sidecar in advance when a Mesos task is going to be killed.
Design considerations: To make it easier, I'm reusing the tombstone state but there may be some edge case I didn't consider here. The other option is to add a new "draining" state, but it'll make the service state more difficult to understand (not very familiar with the finite-state machine here)