NinesStack / sidecar

Gossip-based service discovery. Docker native, but supports non-container discovery, too.
MIT License
69 stars 7 forks source link

Add API to manually mark services as tombstone #44

Closed felixgborrego closed 5 years ago

felixgborrego commented 5 years ago

This is an early WIP, sharing here to get some feedback exploring options to implement it.

Issue: Currently, we don't have a way to tell to Sidecar when a container is about to go down before it actually died. This translated into some errors when rolling out services while Sidecar detects the container as unhealthy and spread the new state in the cluster.

Goal: Have a way for the sidecar-executor to notify Sidecar in advance when a Mesos task is going to be killed.

Design considerations: To make it easier, I'm reusing the tombstone state but there may be some edge case I didn't consider here. The other option is to add a new "draining" state, but it'll make the service state more difficult to understand (not very familiar with the finite-state machine here)

relistan commented 5 years ago

Hey @felixgborrego thanks for the PR. I have some concerns about this approach and suggest a different one. Here's some general comments. More on the code as well.

The first is that re-using the Tombstone status will make it invisible what is happening when we're triggering this state. The second is that the normal processing loop in Sidecar will override this change within 1 a second or two, because an Alive supercedes a Tombstone and the source of truth is:

Docker API -> healthy (sidecar/healthy package) -> state

As long as DockerDiscovery is returning the container and it's healthy according to the health checker, it will get re-populated into the state.

What's required here is a new Status, e.g. DRAINING which is not a Tombstone, and which can be handled as a separate case when updating state. This would also need to be treated like a Tombstone for purposes of gossip messages (they are propagated faster and more widely).

relistan commented 5 years ago

Some further notes:

Should DRAINING have a lifespan like Alive and Tombstone? Just so you can't get stuck draining forever? If so, something like https://github.com/Nitro/sidecar/blob/master/catalog/services_state.go#L32

relistan commented 5 years ago

As long as the state is not ALIVE this will prevent it from being exposed to Envoy: https://github.com/Nitro/sidecar/blob/master/sidecarhttp/envoy_api.go#L301

relistan commented 5 years ago

Happy to set up some time to pair on this with you if needed.

felixgborrego commented 5 years ago

@relistan thanks for your feedback, I'll close this PR exploring options and we'll send a PR with your feedback soon