chaos-mesh / chaos-mesh

A Chaos Engineering Platform for Kubernetes.
https://chaos-mesh.org
Apache License 2.0
6.54k stars 810 forks source link

Emergency Stop button on dashboard #4291

Open jhmartin opened 6 months ago

jhmartin commented 6 months ago

Feature Request

Describe the feature you'd like: A button on the chaos-dashboard that immediately stops all running chaos tests to which the user has permissions.

With multiple teams running tests in the same k8s environment, there is a risk of an unexpected interaction causing impact. Understanding this impact is part of chaos engineering, but there are scenarios where the time to untangle what is exactly causing the impact is unpalatable and instead the operators just want to stop all ongoing chaos tests ie 'Emergency Stop'. The cause might not even be a chaos test, but removing chaos testing as a potential factor is highly valuable.

Describe alternatives you've considered: A custom API script that finds all outstanding tests and deletes them. Having the tools available to call this is more complicated then a web button.

A log entry should be emitted when this occurs for later correlation with application metrics.

STRRL commented 6 months ago

I think it makes sense. Would you like to help us implement this? @jhmartin :heart:

And for implementation of this feature, maybe we could annotate all the experiments with the experiment.chaos-mesh.org/pause=true, instead of terminating them by deleting them. How do you think about it?

ref: https://chaos-mesh.org/docs/next/run-a-chaos-experiment/#pause-chaos-experiments

If you need any kind of help, feel free to comment.

jhmartin commented 6 months ago

Ah yes, pausing makes sense over delete.