StackStorm / community

Async conversation about ideas, planning, roadmap, issues, RFCs, etc around StackStorm
https://stackstorm.com/
Apache License 2.0
8 stars 3 forks source link

StackStorm and Kubernetes Use Cases #38

Open theankushjain opened 4 years ago

theankushjain commented 4 years ago

Patching your Kubernetes nodes is one of the solid use cases that I found. But there is a need to brainstorm and find other use-cases. I believe there are many use cases that are yet to be discovered.

arm4b commented 4 years ago

Examples based on: https://www.devoperandi.com/stackstorm-for-kubernetes-just-took-a-giant-leap-forward/ blog post:

The possibilities are now literally endless.

arm4b commented 4 years ago
arm4b commented 4 years ago
emptywee commented 4 years ago

While we generally leverage Spinnaker for k8s deployments, we mainly use Stackstorm to provide Chatops Aliases for various day-to-day tasks like rolling restarts, regular restarts, maintenance mode on/off (bring down certain set of pods based on the type of maintenance), as well as kubernetes and flatcar/coreos version update. Once our clusters grew to unmanageable number of nodes (unmanageable by hand, I mean), I had to sit down and design a series of workflows to automate kubernetes version update. Since our clusters are own self-managed and self-hosted clusters, initially based on coreos, now flatcar linux, the approach is very specific (e.g. it's based on generating ignition configs; it's also using self-written microservices to generate certs and configs, etc) I am not sure if it's worth sharing them as they are. We also have nginx plus in front of the cluster and iBGP enabled calico-powered connectivity with the clusters with route reflectors running on BIRD software. All these things are utilized during cluster update to take nodes out of rotation safely (mainly for master nodes).

High level idea is pretty simple:

  1. Update master nodes one at a time, safely taking it out of rotation on the respective load balancer, decrease its BGP preference for routing, ensure safe update, execute scripts if need be, update configuration files/manifests, optionally reboot it, and put it back into rotation once it's back from reboot, verify that the cluster is stable, and move on to the next node.
  2. Update each worker by cordoning and draining it, execute any scripts needed, update configuration files/manifests, reboot if requested, and uncordon it once it's ready.

The workflows also update on-going status into a redis instance, which we visualize later using a simple python flask backend + angular frontend apps.

Here's what it looks like:

image

mickmcgrath13 commented 4 years ago

All great ideas!
In particular, I like:

Here are a few more i've come across:

...as has already been mentioned, possibilities are endless. I tend to try to find k8s solutions via the k8s ecosystem, but there are definitely gaps that ST2 can fill.

Also, here's a bonus i came across a while back: