30x / khaos-monkey

Apache License 2.0
2 stars 0 forks source link

other khaotic events? #6

Open noahdietz opened 8 years ago

noahdietz commented 8 years ago

We have the killing of a random pod...

... other ideas for other khaos events:

noahdietz commented 8 years ago
noahdietz commented 8 years ago

started by reorganizing khaotic events into a separate pkg and putting place holders for some of the possible events in this commit...let me know what y'all think of this structure

Going to have to work some git-fu after #4 closes because the aforementioned commit is in a branch off of this PR...couldn't wait for it to be merged 🙈

jbowen93 commented 7 years ago

I think we should decide on what the scope of Khaos Monkey should be. For now I think it should be limited to things that would commonly occur in a cluster. Pod failures are the obvious one, Node draining is a good one. I'm unsure if we want to be killing controllers (replication controllers, daemon sets, stateful sets, etc.) as those are the primary means of recovering and it's unclear to me how one would automatically recover from such an event.

noahdietz commented 7 years ago

Agreed, killing controllers would be non-recoverable. I do think, however, that targeting pods that are managed by a daemonset could be interesting i.e. killing multiple of the routers and making sure that traffic into/within the cluster isn't hindered and make sure the routers come back up correctly.

This might be a specific use case though.

jbowen93 commented 7 years ago

I feel daemonsets would be targeted appropriately when we kill random pods and drain random nodes.

AdamMagaluk commented 7 years ago

Maybe we could adopt the same model as chaos monkey in Netflix having an Opt-in or Opt-out mode that would help scope what you would want to start destroying.

It would be cool to add in time network fail modes. Network latency, complete cut off, etc...

noahdietz commented 7 years ago

@AdamMagaluk totally agree! I implemented an opt-in/out type env var, essentially just a list of strings that map to the desired events you would be OK with happening. Then it randomly picks one and runs the event

I like the network fail modes, any idea on how to do this? Right now we were just doing damage with the k8s API, but if you've got an idea for how to do it without the API in a generic way, lets do it.

First thing that comes to mind would be deleting ingress rules (for those that use them, we don't), or deleting network policies (for us these are integral).

AdamMagaluk commented 7 years ago

I don't have an idea off hand. Will need to dig a little into how the low level networking for kubernetes work. I'll start thinking about it.

noahdietz commented 7 years ago

Another thought would be jacking up the kube-dns pods in some way...not sure...need to plot some more