DataDog / kvexpress

## Auto-archived due to inactivity. ## Go program to move data in and out of Consul's KV store.
Apache License 2.0
129 stars 13 forks source link

Be able to deploy code to a subset of nodes. #53

Closed darron closed 8 years ago

darron commented 8 years ago

For a phased rollout approach - some thoughts on how we might accomplish that:

  1. We tag nodes in chef with a new tag: tag:canary. These nodes need to be representative of almost all types of nodes in all AZs to ensure coverage. We create a dashboard or series of dashboards that scopes itself to nodes with that role so we can see how they change.
  2. The LWRP gets extended to look for that role and add a /canary/ path entry. This moves /kvexpress/features.ini to /kvexpress/canary/features.ini.
  3. The kvexpress in path for Consul Template gets adjusted to push items only to the canary path.
  4. Once it's reviewed and pushed in consul-config - we monitor the canary nodes.
  5. The last step is some sort of promotion from the canary path to the standard path - therefore deploying to the rest of the nodes. We would likely need to add a kvexpress copy command that moves the data from location to location.

Pros:

  1. Gives us a subset of nodes we can inspect - those nodes are well known and can be scaled up and down as needed.
  2. Makes widespread code changes "safer" from an impact standpoint. It will only ever impact a subset of nodes at first.

Cons:

  1. Need to build something that promotes the canary change once it's deemed "good".
  2. Slows down the process a bit - if speed is our highest priority.
  3. Can't cover all possible scenarios - but can help with things that don't have the expected level of impact.
  4. Will need to monitor how many nodes and of what types are placed - so that we don't lose coverage as nodes get rolled out.
darron commented 8 years ago

Added the copy command:

https://github.com/DataDog/kvexpress/commit/a49f3da93eb90ac2d59d1f9abe4c85432247e3f9

May create a Consul event that can promote the file by copying to the standard location:

consul event -service consul -name promote-config features.ini

Would copy the file from: /kvexpress/canary/features.ini to /kvexpress/features.ini - here's the actual kvexpress command:

bin/kvexpress copy --keyfrom canary/features.ini --keyto features.ini

darron commented 8 years ago

Would likely require a web interface to promote from. That web interface would:

  1. List files in the kvexpress/canary/ heirarchy that differ from kvexpress/
  2. Display the diffs - optionally link back to consul-config
  3. Clicking promote would run the Consul event - we could kick off the event via HTTP API.

There would always be a link to the promotion site from the Datadog Events - for example:

https://app.datadoghq.com/event/event?id=301026997457923231

miketheman commented 8 years ago

I think this is a good start - we discussed the "how to ensure we have 10% canary nodes of X type" in person, and how to possibly automate that to absolve Con#4, as well as provide a utility function to handle it in code.

darron commented 8 years ago

Adjusted the LWRP so that we can actually start testing this:

https://github.com/DataDog/devops/commit/f6f2863336e779de063ed975fc57e29efdbf4b58

darron commented 8 years ago

Have tagged these 35 nodes in staging - a decent representative cross section of nodes:

https://gist.github.com/darron/c0b7477fd81184d7d32b

darron commented 8 years ago

OK - so the first test of the kvexpress_group worked - here's how it happened:

http://shared.froese.org/2015/oq4fg-11-51.jpg

FYI - the change noted was this: https://github.com/DataDog/devops/commit/fbe940661ed650dd87a21c2cde2e58d3f23f2973

The copy command that essentially "promoted" the config from canary nodes only to all nodes had this output:

[staging]root@i-1dc56ea3:/etc/consul-template/special-config# KVEXPRESS_DEBUG=1 /usr/local/bin/kvexpress copy -C /etc/datadog/kvexpress.yaml --keyfrom canary/kvexpress_canary_test.ini --keyto kvexpress_canary_test.ini --verbose
2015-12-24T18:48:28Z: copy: config: filename='/etc/datadog/kvexpress.yaml'
2015-12-24T18:48:28Z: copy: Checking cli flags.
2015-12-24T18:48:28Z: copy: Enabling Dogstatsd metrics.
2015-12-24T18:48:28Z: copy: Enabling Datadog API.
2015-12-24T18:48:28Z: copy: username='root'
2015-12-24T18:48:28Z: copy: Required cli flags present.
2015-12-24T18:48:28Z: copy: path='data' fullPath='kvexpress/canary/kvexpress_canary_test.ini/data'
2015-12-24T18:48:28Z: copy: path='checksum' fullPath='kvexpress/canary/kvexpress_canary_test.ini/checksum'
2015-12-24T18:48:28Z: copy: server='localhost:8500' token='047d5fc7'
2015-12-24T18:48:28Z: copy: action='get' key='kvexpress/canary/kvexpress_canary_test.ini/data'
2015-12-24T18:48:28Z: copy: action='get' key='kvexpress/canary/kvexpress_canary_test.ini/checksum'
2015-12-24T18:48:28Z: copy: length='13' minLength='10'
2015-12-24T18:48:28Z: copy: longEnough='true'
2015-12-24T18:48:28Z: copy: computedChecksum='077c2baa1718b09b6a3963e5c11d16729001f68ac03650518e8f1cfac338af61'
2015-12-24T18:48:28Z: copy: checksum='077c2baa1718b09b6a3963e5c11d16729001f68ac03650518e8f1cfac338af61' computedChecksum='077c2baa1718b09b6a3963e5c11d16729001f68ac03650518e8f1cfac338af61'
2015-12-24T18:48:28Z: copy: checksumMatch='true'
2015-12-24T18:48:28Z: copy: copy='true' keyFrom='canary/kvexpress_canary_test.ini' keyTo='kvexpress_canary_test.ini'
2015-12-24T18:48:28Z: copy: path='data' fullPath='kvexpress/kvexpress_canary_test.ini/data'
2015-12-24T18:48:28Z: copy: path='checksum' fullPath='kvexpress/kvexpress_canary_test.ini/checksum'
2015-12-24T18:48:28Z: copy: action='set' key='kvexpress/kvexpress_canary_test.ini/data'
2015-12-24T18:48:28Z: copy: consul KeyData='kvexpress/kvexpress_canary_test.ini/data' saved='true' size='274'
2015-12-24T18:48:28Z: copy: action='set' key='kvexpress/kvexpress_canary_test.ini/checksum'
2015-12-24T18:48:28Z: copy: datadog='true' DDCopyDataEvent='true' keyFrom='canary/kvexpress_canary_test.ini' keyTo='kvexpress_canary_test.ini'
2015-12-24T18:48:28Z: copy: dogstatsd='true' key='kvexpress_canary_test.ini' stats='in'
2015-12-24T18:48:28Z: copy: dogstatsd='true' key='' location='complete' msec='65'
2015-12-24T18:48:28Z: copy: location='complete', elapsed='65.283617ms'

Will create a wiki page that shows how it works.

darron commented 8 years ago

Have updated the wiki page here:

https://github.com/DataDog/devops/wiki/Adding-a-file-to-kvexpress#i-want-to-have-canary-nodes-that-get-the-config-first

darron commented 8 years ago

This is live and in prod now.

Need to figure out which configs to start with.