chaosblade-io / chaosblade

An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
https://chaosblade.io
Apache License 2.0
5.92k stars 943 forks source link

Expose ChaosBlade event into SkyWalking & Prometheus #499

Open tiny-x opened 3 years ago

tiny-x commented 3 years ago

Usually, a lot of events will occur in the operation of the system, such as process anomaly, restart, chaos experiment, etc. The occurrence of events may affect the stability of the system. Therefore, we need to output the events of the chaos experiment, and then we can import event into SkyWalking and Prometheus.

Expose ChaosBlade event into SkyWalking

SkyWalking provides friendly ways for other systems to integrate with us. Such as

  1. CLI event report, https://github.com/apache/skywalking-cli#event. You could use shell.
  2. Through the k8s event channel, https://github.com/apache/skywalking-kubernetes-event-exporter. But this would limit you in the k8s field. Even the project is in the CNCF, but I am feeling we should not limit the scope of the project. As Chaos engineering clearly is not only about k8s env.
  3. Use gRPC protocol directly, https://github.com/apache/skywalking-data-collect-protocol/blob/master/event/Event.proto. This is easy to adopt and doesn't limit the language or env. Just need to write a few codes.
  4. If you are using (3) in golang(this project is written in go), this repo would be better to integrate than the proto itself. https://github.com/apache/skywalking-goapi

See more information https://github.com/chaosblade-io/chaosblade/issues/495

Expose ChaosBlade event into Prometheus

See more information https://prometheus.io/docs/instrumenting/writing_exporters/

Summary

The event types you can export include but are not limited to:

breakertt commented 3 years ago

For SkyWalking, I posted the latest ideas on issue #495. For Prometheus, my idea is to implement a 'metrics' HTTP Get path on chaosblade-box. An example for one metric would be like network_experiment{target="192.168.18.70",k8s="false"} 0. We can directly use the database for previous records to generate metrics.