Open wu-sheng opened 3 years ago
This is really a good idea. Let me read the documents you provided first. We will open-source Java and Golang SDK for application chaos in the future. I think ChaosBlade can integrate SkyWalking well.
Once, this hasn't haven to an application-level event. We have VM and pod monitoring from k8s and service mesh perspective
So, there are going to be various ways to integrate. We don't have to wait for the chaos SDK.
@wu-sheng I have discussed with @xcaspar today about details in exposing chaosblade, especially JVM tracing expose to SkyWalking in the first stage.
Externally, the communication and report will be between Skywalking and chaosblade-box instead of chaosblade directly. There are two main advantages of using chaosblade-box: 1. No invade to chaosblade itself. 2. Great compatibility, the exposing is not limited to K8S or even chaosblade. The expose of chaosmesh to Skywalking can also be supported if chaosblade-box supports chaosmesh in the future. The protocol used will be gRPC. I assume you are also mainly concentrating on the runtime tracing report, thus, I will try to implement the JVM tracing expose to SkyWalking and ignore the experiment about CPU and network.
Internally, the endpoint
parameter between chaosblade, chaosblade-box, chaosblade-operator, and chaosblade-exec-jvm will be reused and used to report JVM tracing and event internally.
Last, the support of tracing inspection in chaosblade-jvm-exec will be implemented. I believe there are already some great examples for it, do you have any ideas? I would appreciate it!
especially JVM tracing expose to SkyWalking in the first stage.
I think JVM level is fine, but it seems we don't have a real relationship with tracing core, right? The relationship should rely on timestamp, right?
I will try to implement the JVM tracing expose to SkyWalking and ignore the experiment about CPU and network.
At stage 1, I am fine with ignoring the CPU or network. Eventually, we should support this too. SkyWalking has supporting VM monitoring(through Prometheus node exporter or zabbix agent), it would be great we have a VM service level event.
Externally, the communication and report will be between Skywalking and chaosblade-box instead of chaosblade directly.
Once this is recommended by your community, we are totally fine.
The protocol used will be gRPC
Does this mean, we are going to use https://github.com/apache/skywalking-data-collect-protocol/blob/master/event/Event.proto to report the event(or goapi repo)?
@wu-sheng
I think JVM level is fine, but it seems we don't have a real relationship with tracing core, right? The relationship should rely on timestamp, right?
Sorry, I don't really understand your idea. What I have imagined is: Once the related JVM receives a function call from the user, e.g. HTTP GET, the whole tracing like https://skywalking.apache.org/screenshots/8.4.0/trace.jpg will be reported. Would mind helping me more on this?
At stage 1, I am fine with ignoring the CPU or network. Eventually, we should support this too. SkyWalking has supporting VM monitoring(through Prometheus node exporter or zabbix agent), it would be great we have a VM service level event.
Yes, I can understand that. I also discussed exposing the VM status via node exporter or something else.
Does this mean, we are going to use https://github.com/apache/skywalking-data-collect-protocol/blob/master/event/Event.proto to report the event(or goapi repo)?
Yes. For goapi repo, as we are going to implement in chaosblade-box (written in Java), so no.
@xcaspar After our quick talk, I want to submit this integration to the ChaosBlade community officially.
Background
Chao Engineering is a method to test the project robust, it could be done in the test even prod event. In the testing, the owner of the system is expecting to check the system's interaction with the failure/load injected by the ChaosBlade. SkyWalking, as an APM system, is targeting to collect, analysis, and visualize the system health system from different angles, clearly, we all widely know traces, metrics, and logs. Recently, with the expanding of SkyWalking, we introduce the Event concept
And the fail/load injections are clear, they are events.
Read SkyWalking's doc for more details, https://skywalking.apache.org/docs/main/latest/en/concepts-and-designs/event/
Solution
SkyWalking provides friendly ways for other systems to integrate with us. Such as
I and @kezhenxu94 are willing to help if you face any issue in the integration process.