chaosblade-io / chaosblade

An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
https://chaosblade.io
Apache License 2.0
5.86k stars 934 forks source link

Unable to Execute In-Container JVM/Kafka Experiments in Latest Versions of ChaosBlade #1052

Open ramgiteng opened 1 week ago

ramgiteng commented 1 week ago

Issue

Unable to Execute In-Container JVM/Kafka Experiments in Latest Versions of ChaosBlade

Issue Description

With the latest versions of Chaosblade, I am unable to execute any Chaos experiments on a Java process that is running inside another Docker container. This applies to Kafka and JVM experiments.

Type: Bug report

Describe what happened (or what feature you want)

Setup 1: Chaosblade executable existing in a running Docker container Setup 2: Java process running in a separate Docker container

After executing the following command in Container 1, ./blade create cri jvm --chaosblade-override true --chaosblade-release <pathToChaosBladeTarGZ> cpufullload --process java --container-name javaappcontainer --timeout 30s

Error: {"code":63067,"success":false,"error":"execContainer: container exec failed, err: exit status 1"} Logs: time="2024-07-07 07:37:25.976539673 EDT" level=info msg="exec container cmd: /home/<user>/chaosblade/chaosblade-1.7.4/bin/nsexec -t 3565 -p -m -n -- /bin/sh -c [ -e /opt/chaosblade/blade ] && echo True || echo False" location="/go/pkg/mod/github.com/chaosblade-io/chaosblade-exec-cri@v1.7.4/exec/container/container_linux.go:93" uid=d0d348dda8a5c0ec

time="2024-07-07 07:37:25.97919496 EDT" level=info msg="run copy cmd: /home/<user>/chaosblade/chaosblade-1.7.4/bin/nsexec -t 3565 -p -m -- /bin/sh -c cat > /opt/chaosblade-1.7.4-linux-amd64.tar.gz" location="/go/pkg/mod/github.com/chaosblade-io/chaosblade-exec-cri@v1.7.4/exec/container/container_linux.go:40" uid=d0d348dda8a5c0ec

time="2024-07-07 07:37:25.980804861 EDT" level=error msg="DeployChaosBlade err: exit status 2" location="/go/pkg/mod/github.com/chaosblade-io/chaosblade-exec-cri@v1.7.4/exec/executor_execin.go:104" uid=d0d348dda8a5c0ec

This issue has been observed in versions after v1.5.0. I tested with the newly released v1.7.4 and still faced the same issue.

Using Chaosblade v1.5.0, I am able to execute in-container experiments on JVM and Kafka successfully.

Describe what you expected to happen

To successfully perform Cri (Docker) experiments, including JVM and Kafka, on JVMs running inside another Docker container and respond with a success "true" message.

Example: {"code":200,"success":true,"result":"55e49e9a472103b5"}

How to reproduce it (as minimally and precisely as possible)

  1. Java app (Any JDK type/version)
  2. Docker
  3. Chaosblade versions: v1.7.2/v1.7.3/v1.7.4

Tell us your environment

  1. Java app (Any JDK type/version)
  2. Docker
  3. Chaosblade versions: 1.7.2/1.7.3/1.7.4

Anything else we need to know?

Could you please help troubleshoot this issue?

Please let me know if you require any additional details and information. I will be happy to provide.

Thank you for the support.