kube-monkey is an implementation of Netflix's Chaos Monkey for Kubernetes clusters. It randomly deletes Kubernetes (k8s) pods in the cluster encouraging and validating the development of failure-resilient services.
Join us at #kube-monkey on Kubernetes Slack.
kube-monkey runs at a pre-configured hour (run_hour
, defaults to 8 am) on weekdays, and builds a schedule of deployments that will face a random
Pod death sometime during the same day. The time-range during the day when the random pod Death might occur is configurable and defaults to 10 am to 4 pm.
kube-monkey can be configured with a list of namespaces
To disable the blacklist provide [""]
in the blacklisted_namespaces
config.param.
kube-monkey works on an opt-in model and will only schedule terminations for Kubernetes (k8s) apps that have explicitly agreed to have their pods terminated by kube-monkey.
Opt-in is done by setting the following labels on a k8s app:
kube-monkey/enabled
: Set to "enabled"
to opt-in to kube-monkey
kube-monkey/mtbf
: Mean time between failure (in days). For example, if set to "3"
, the k8s app can expect to have a Pod
killed approximately every third weekday.
kube-monkey/identifier
: A unique identifier for the k8s apps. This is used to identify the pods
that belong to a k8s app as Pods inherit labels from their k8s app. So, if kube-monkey detects that app foo
has enrolled to be a victim, kube-monkey will look for all pods that have the label kube-monkey/identifier: foo
to determine which pods are candidates for killing. The recommendation is to set this value to be the same as the app's name.
kube-monkey/kill-mode
: Default behavior is for kube-monkey to kill only ONE pod of your app. You can override this behavior by setting the value to:
kill-all
if you want kube-monkey to kill ALL of your pods regardless of status (including not ready and not running pods). Does not require kill-value
. Use this label carefully.fixed
if you want to kill a specific number of running pods with kill-value
. If you overspecify, it will kill all running pods and issue a warning.random-max-percent
to specify a maximum %
with kill-value
that can be killed. At the scheduled time, a uniform random specified %
of the running pods will be terminated.fixed-percent
to specify a fixed %
with kill-value
that can be killed. At the scheduled time, a specified fixed %
of the running pods will be terminated.kube-monkey/kill-value
: Specify value for kill-mode
fixed
, provide an integer of pods to killrandom-max-percent
, provide a number from 0
-100
to specify the max %
of pods kube-monkey can killfixed-percent
, provide a number from 0
-100
to specify the %
of pods to kill---
apiVersion: apps/v1
kind: Deployment
metadata:
name: monkey-victim
namespace: app-namespace
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/mtbf: '2'
kube-monkey/kill-mode: "fixed"
kube-monkey/kill-value: '1'
[... omitted ...]
For newer versions of kubernetes you may need to add the labels to the k8s app metadata as well.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: monkey-victim
namespace: app-namespace
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/mtbf: '2'
kube-monkey/kill-mode: "fixed"
kube-monkey/kill-value: '1'
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
[... omitted ...]
// TODO: switch to using cluster DNS.
note in the code, you may need to override the apiserver.[kubernetes]
host="https://your-apiserver-url.com:apiport"
Scheduling happens once a day on Weekdays - this is when a schedule for terminations for the current day is generated. During scheduling, kube-monkey will:
kube-monkey/mtbf
) to determine if a pod for that k8s app should be killed todayThis is the randomly generated time during the day when a victim k8s app will have a pod killed. At termination time, kube-monkey will:
Docker images for kube-monkey can be found at DockerHub
Clone the repository and build the container.
go get github.com/asobti/kube-monkey
cd $GOPATH/src/github.com/asobti/kube-monkey
make build
make container
kube-monkey is configured by environment variables or a toml file placed at /etc/kube-monkey/config.toml
and expects the configmap to exist before the kube-monkey deployment.
Configuration keys and descriptions can be found in config/param/param.go
[kubemonkey]
dry_run = true # Terminations are only logged
run_hour = 8 # Run scheduling at 8am on weekdays
start_hour = 10 # Don't schedule any pod deaths before 10am
end_hour = 16 # Don't schedule any pod deaths after 4pm
blacklisted_namespaces = ["kube-system"] # Critical apps live here
time_zone = "America/New_York" # Set tzdata timezone example. Note the field is time_zone not timezone
KUBEMONKEY_DRY_RUN=true
KUBEMONKEY_RUN_HOUR=8
KUBEMONKEY_START_HOUR=10
KUBEMONKEY_END_HOUR=16
KUBEMONKEY_BLACKLISTED_NAMESPACES=kube-system
KUBEMONKEY_TIME_ZONE=America/New_York
Note: this will keep attacking pods every 60s regardless of what you configured for the startHour
and endHour
.
[debug]
enabled= true
schedule_immediate_kill= true
Kube-monkey supports notifications and can notify an endpoint of your choice after an attack. It can be a Slack webhook or a custom API.
[notifications]
enabled = true
reportSchedule = true
[notifications.attacks]
endpoint = "http://url1"
message = "message1"
headers = ["header1Key:header1Value","header2Key:header2/Value"]
The message supports the following placeholders:
{$name}
: victim's name{$kind}
: victim's kind{$namespace}
: victim's namespace{$timestamp}
: attack's time from Unix epoch in milliseconds{$time}
: attack's time{$date}
: attack's date{$error}
: result's error, if any{$kubemonkeyid}
: kube-monkey id (set using KUBE_MONKEY_ID env variable otherwise empty) message: '{
"what": "Kube-monkey(${kubemonkeyid}) attack of {$name} in {$namespace}",
"who": "{$name}",
"when": {$timestamp}
}'
The header supports a special placeholder to retrieve the value of an environment variable. This is useful when calling an API that has a protected endpoint. A typical scenario will be to pass an API token to the Kube-monkey container, this token is stored in a Kubernetes Secret and you want to pass it via an environment variable.
headers = ["api-key:{$env:API_TOKEN}", "Content-Type:application/json"]
{$env:API_TOKEN}
will be replaced by the environment variable API_TOKEN
value.
Note if the environment variable does not exist, the notification call will NOT be cancelled. The value will resolve to an empty string, and a warning will show up in the logs.
Manually
kube-monkey-config-map
configmap in the namespace you intend to run kube-monkey in (for example, the kube-system
namespace). Make sure to define the keyname as config.toml
For example
kubectl create configmap km-config --from-file=config.toml=km-config.toml
orkubectl apply -f km-config.yaml
kube-system
).See dir examples/
for example Kubernetes yaml files.
kubectl logs -f deployment.apps/kube-monkey --namespace=kube-system
here the deployment.apps/kube-monkey
is the k8s deployment for kube-monkey.Helm Chart
See How to install kube-monkey with Helm.
kube-monkey uses glog and supports all command-line features for glog. To specify a custom v level or a custom log directory on the pod, see args: ["-v=5", "-log_dir=/path/to/custom/log"]
in the example deployment file
*Standardized glog levels `grep -r V([0-9]) `**
L0: None
L1: Highest Level current status info and Errors with Terminations
L2: Successful terminations
L3: More detailed schedule status info
L4: Debugging verbose schedule and config info
L5: Auto-resolved inconsequential issues
More resources: See the k8s logging page suggesting community conventions for logging severity
git clone https://github.com/asobti/kube-monkey.git
cd examples
oc login http://someserver/ -u system:admin
oc project kube-system
oc create -f configmap.yaml
oc -n kube-system adm policy add-role-to-user -z deployer system:deployer
oc -n kube-system adm policy add-role-to-user -z builder system:image-builder
oc -n kube-system adm policy add-role-to-group system:image-puller system:serviceaccounts:kube-system
oc run kube-monkey --image=docker.io/ayushsobti/kube-monkey:v0.4.0 --command -- /kube-monkey -v=5 -log_dir=/var/log/kube-monkey
oc volume dc/kube-monkey --add --name=kubeconfigmap -m /etc/kube-monkey -t configmap --configmap-name=kube-monkey-config-map
git clone https://github.com/asobti/kube-monkey.git
cd examples
oc login http://someserver/ -u system:admin
oc project kube-system
oc create -f configmap.yaml
oc -n kube-system adm policy add-cluster-role-to-user edit -z default --rolebinding-name kube-monkey-edit
oc run kube-monkey --image=docker.io/ayushsobti/kube-monkey:v0.3.0 --command -- /kube-monkey -v=5 -log_dir=/var/log/kube-monkey
oc set volume dc/kube-monkey --add --name=kubeconfigmap -m /etc/kube-monkey -t configmap --configmap-name=kube-monkey-config-map
This project is licensed under the Apache License v2.0 - see the LICENSE file for details.