aws-samples / aws-workshop-for-kubernetes

AWS Workshop for Kubernetes
Apache License 2.0
2.66k stars 1.07k forks source link

Create SRE Guide for Kubernetes #387

Open christopherhein opened 6 years ago

christopherhein commented 6 years ago

Add a module discussing tactics around: debugging, outages, failures in distributed systems, etc.

arun-gupta commented 6 years ago

EC2 instance type: https://kubernetes.io/docs/admin/cluster-large/

StevenACoffman commented 6 years ago

Related: How to Recover a Broken Kubernetes Cluster

christopherhein commented 6 years ago

https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server

StevenACoffman commented 6 years ago

Also, avoid t2 (burstable) instance types, as Kubernetes overschedules them when they have spare capacity and then they become unresponsively slow when their burst credit is used up. This leads to cascading failures, as things get re-scheduled to responsive nodes that then use up their credits.