bapti / dundee-data-day

MIT License
0 stars 0 forks source link

Project statement #1

Open bapti opened 8 years ago

bapti commented 8 years ago

Talk

Demo usage of monitoring and alerting

bapti commented 8 years ago

@cizer Not so sure on Ansible since it doesn't work on windows which makes for a bad developer experience. Would mean developing on ubuntu or mac - what's your thoughts?

https://atlas.hashicorp.com This is a nice online thing we could use - plus we can develop locally with vagrant

cizer commented 8 years ago

@bapti Tech stack looks fine and I'll take your word on Ansible although I use a mac at home anyway. We could give atlas a try if you like or Chef?

I'd like to get the demo scenario nailed down a bit further before settling on the tech stack. I recall you were talking about a continous deployment strategy involving pushing updates to a small portion of a production userbase and monitoring for issues rather than doing exhaustive testing pre-deployment.

So how about this:

Fairly simple and get's the point across. Not sure if it's exactly what you had in mind or if it's what Prometheus is best fitted for but we can refine. What do you think?

bapti commented 8 years ago

@cizer

Awesome, love it

bapti commented 8 years ago

https://www.docker.com/docker-toolbox

cizer commented 8 years ago

@bapti Ok v2.

Deploy v1

Button Test

  1. Run app that logs to prometheus on 'v1' button click.
  2. Show the metrics page and highlight the button version is passed as a dimension/label and that the metric works
  3. Demo prometheus server graph

Deploy v2

v2 release includes v1 button but hidden - like a reverse feature toggle, can we call it a "rollback toggle" or a "canary toggle"

  1. Deploy next version of app with a replacement 'v2' button.
  2. Demo that the new button generates a new labelled dimension and metric.
  3. Get audience to generate some kind of error state - perhaps randomly generated we can show the metrics in real time - success metric and error metric.
    • Alert rules are set to trigger a generic webhook that posts to our API and causes the rollback event on lets say 20 failed clicks and an event is raised that triggers a roll back to v1 and generates an alert
  4. Users will see their app rollback (can we use websocket to do this realtime) button now displays 'v1'.

Metric analysis

  1. Show slack integration
  2. If we can let's have a look in promdash or grafana