StackStorm and Kubernetes Use Cases

theankushjain commented 4 years ago

Patching your Kubernetes nodes is one of the solid use cases that I found. But there is a need to brainstorm and find other use-cases. I believe there are many use cases that are yet to be discovered.

arm4b commented 4 years ago

Examples based on: https://www.devoperandi.com/stackstorm-for-kubernetes-just-took-a-giant-leap-forward/ blog post:

Imagine being able to configure network policies through an automated StackStorm workflow based on a particular projects needs.
Think about how RBAC could be managed using our Kubernetes Authz Webhook through StackStorm.
Or how about kicking of Kubernetes Jobs to Administer some cluster level cleanup activity but handing that off to your NOC.
Or allowing your Operations team to patch a HorizontalPodAutoscaler through a UI.
We could build a metadata framework derived from the Kubernetes API annotations/labels for governance.

The possibilities are now literally endless.

From the same author: Automating Deployment of RDS Database with stackstorm: http://www.devoperandi.com/kubernetes-stackstorm-and-third-party-resources-part-2/

arm4b commented 4 years ago

K8s app Blue/Green Deployments. While K8s provides rolling upgrades for the Pods, the missing in the Kubernetes engine is Blue-Green deployments for the app K8s is running. Following the https://www.ianlewis.org/en/bluegreen-deployments-kubernetes example, StackStorm can do that kind of automation in a good and more maintainable way with the Workflows.
Adding ChatOps to K8s How about triggering a K8s deployment for the new version of your app from the chat?
Updating K8s cluster version. One of the first google results about upgrading K8s cluster version: https://medium.com/retailmenot-engineering/zero-downtime-kubernetes-cluster-upgrades-aab4cac943d2 The HOWTO provides a series of steps and suggests setting up a new node pool and migrate K8s workfloads from old nodes to new. This all could be organized together in the StackStorm workflow and better automated instead of manual repetitive steps. As you gain the new operational knowledge the workflow could be improved with more logic and edge cases steps.

arm4b commented 4 years ago

Security compliance checks & Remediation for every new K8s Deployment StackStorm to listen for the new K8s deployments and run a series of security checks and rules once there is a new deployment. In case of failed security check, - open a ticket with report, create an alert, page human, etc. Potential Remediation action: block a deployment/restrict network access/etc.
Security response Automation based on K8s cluster audit events StackStorm can be a consumer for a webhook request from the K8s cluster audit service https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ and react based on these events.
Security keys Rotation for the K8s cluster/apps Rotation for any security items is rarely easy and frequently involves repetitive steps (ex: https://github.com/kubernetes/kubernetes/issues/20165). StackStorm can not just automate that as external orchestrator, but also keep this important operational workflow knowledge documented and maintained as a code. This could be applied for K8s cluster administration in general or even for individual apps running in K8s.

emptywee commented 4 years ago

While we generally leverage Spinnaker for k8s deployments, we mainly use Stackstorm to provide Chatops Aliases for various day-to-day tasks like rolling restarts, regular restarts, maintenance mode on/off (bring down certain set of pods based on the type of maintenance), as well as kubernetes and flatcar/coreos version update. Once our clusters grew to unmanageable number of nodes (unmanageable by hand, I mean), I had to sit down and design a series of workflows to automate kubernetes version update. Since our clusters are own self-managed and self-hosted clusters, initially based on coreos, now flatcar linux, the approach is very specific (e.g. it's based on generating ignition configs; it's also using self-written microservices to generate certs and configs, etc) I am not sure if it's worth sharing them as they are. We also have nginx plus in front of the cluster and iBGP enabled calico-powered connectivity with the clusters with route reflectors running on BIRD software. All these things are utilized during cluster update to take nodes out of rotation safely (mainly for master nodes).

High level idea is pretty simple:

Update master nodes one at a time, safely taking it out of rotation on the respective load balancer, decrease its BGP preference for routing, ensure safe update, execute scripts if need be, update configuration files/manifests, optionally reboot it, and put it back into rotation once it's back from reboot, verify that the cluster is stable, and move on to the next node.
Update each worker by cordoning and draining it, execute any scripts needed, update configuration files/manifests, reboot if requested, and uncordon it once it's ready.

The workflows also update on-going status into a redis instance, which we visualize later using a simple python flask backend + angular frontend apps.

Here's what it looks like:

mickmcgrath13 commented 4 years ago

All great ideas!
In particular, I like:

K8S based RBAC
utilizing K8S secrets
chatops for non config-based tasks (restarts, etc)
node patching/updating

Here are a few more i've come across:

K8S cleanup - sensor to clean up old pods from jobs (or any otherwise "dangling" pods)
Helm support - helm doesn't always work (a failed deployment might get you into a "can't deploy because there are no deployed releases" scenario, for example). A stackstorm sensor to detect that state and clean it up (if deployment logs live somewhere ST2 can reach them) or a webhook in the deployment script that says "if can't deploy [...], call ST2".
Dynamic environment cleanup - If you create dynamic environments (per pull request, for example) and don't have a direct way to clean it up, maybe give "namespaces" with a specific label a TTL via stackstorm

...as has already been mentioned, possibilities are endless. I tend to try to find k8s solutions via the k8s ecosystem, but there are definitely gaps that ST2 can fill.

Also, here's a bonus i came across a while back:

https://github.com/flant/shell-operator (ya know.. in case your st2 instance(s) aren't allowed to talk to your k8s instance directly).
shell-operator -> script that calls an st2 webhook -> st2

StackStorm / community

StackStorm and Kubernetes Use Cases #38