DataDog / kvexpress

## Auto-archived due to inactivity. ## Go program to move data in and out of Consul's KV store.
Apache License 2.0
129 stars 13 forks source link

Add ability to stop `out` process on a single node. #54

Closed darron closed 8 years ago

darron commented 8 years ago

touch /tmp/kvexpress-full-stop

Log that to a Datadog metric at least - so it's not completely invisible.

darron commented 8 years ago

We already have the full stop for the system on a per key basis:

https://github.com/DataDog/kvexpress#stop-command-flags

This is just on an individual node.

darron commented 8 years ago

Speaking with @jhulten - here's likely a better interface:

sudo kvexpress lock -f /etc/datadog/hosts.consul

sudo kvexpress lock --all

That prevents kvexpress from updating that file on this host - there are likely several things we should do / think about:

  1. We need to store the hostname in the KV - so that it sees the presence when it tries to save an update.
  2. We may want to write a file /etc/datadog/hosts.consul.locked that helps people to remember - that the file will NOT be updated. Inside that file we can say "Hey - this file won't be updated - to unlock - do this."
  3. We need to log a metric whenever kvexpress out runs and sees the lock - gives us visibility into all locked files.

Implications:

  1. This data should be stored in the /kvexpress/ KV hierarchy. That's currently write limited to Consul Server nodes and a little too flat. Need to organize better so that it all fits together.
  2. If we're passing full config files - then we need to find a way to replicate that path in /kvexpress it currently has no concept of the path.
  3. That means we would not be able to have the same file with custom information.
  4. Will need to adjust this against how we're deploying canary code.
darron commented 8 years ago

Added lock:

https://github.com/DataDog/kvexpress/commit/664540fdd6e4c57b9077dd09983fbcc49c202ce3

Added unlock:

https://github.com/DataDog/kvexpress/commit/1ea7c5482d72561d0951fcf6f59669a6bee3dd8d

Will test in staging.

Have ignored the complete node lock - --all - for now - not sure it's needed.

darron commented 8 years ago

Also made sure that a full file path was being passed.

Ready for testing in staging tomorrow.

darron commented 8 years ago

Have added the lock and unlock functionality and have tested in staging - here's a small video demonstrating how it works:

http://screencast.com/t/a0ueKgjmuyd

There are metrics around the lock and unlock operations - also when something is locked - we send kvexpress.locked metrics as well - they're all available on the staging Consul dashboard:

https://dd.datad0g.com/dash/4584/consul

cc @clutchski @miketheman @clofresh @alq666

darron commented 8 years ago

This is rolling out in prod right now.

I didn't add an --all at this time.

We can if we need to.