chaosblade-io / chaosblade-operator

chaosblade operator for kubernetes experiments
Apache License 2.0
164 stars 102 forks source link

通过yaml配置的测试实验无法删除 #33

Closed Mountaincnc closed 4 years ago

Mountaincnc commented 4 years ago

Issue Description

Type: bug report or feature request

Describe what happened (or what feature you want)

通过yaml配置的测试实验无法删除

Describe what you expected to happen

How to reproduce it (as minimally and precisely as possible)

  1. 使用官方文档提供的示例
    apiVersion: chaosblade.io/v1alpha1
    kind: ChaosBlade
    metadata:
    name: cpu-load
    spec:
    experiments:
    - scope: node
    target: cpu
    action: fullload
    desc: "increase node cpu load by names"
    matchers:
    - name: names
      value:
      - "xxx-xxx-xxx-xxx"
    - name: cpu-percent
      value:
      - "80"

    进行部署后,节点CPU使用率升到80%

  2. 删除该实验
    
    kubectl delete chaosblades.chaosblade.io cpu-load
    chaosblade.chaosblade.io "cpu-load" deleted
一直卡在`chaosblade.chaosblade.io "cpu-load" deleted`

查看资源状态: 
```shell
$ kubectl get chaosblades.chaosblade.io cpu-load -o yaml
...
status:
  expStatuses:
  - action: fullload
    error: see resStatus for the error details
    resStatuses:
    - error: |-
        sh: invalid number '454410380578/opt/chaosblade/bin/chaos_burncpu'
         exit status 1 exit status 1
      id: e0938f35b77d523c
      kind: node
      name: xxxxx-xxxx-xxx-xxx
      nodeName: xxxxx-xxxx-xxx-xxx
      state: Error
      success: false
    scope: node
    state: Error
    success: false
    target: cpu
  phase: Destroying
...
# 查看chaosblade-operator的日志发现删除失败
$ kubectl -n chaosblade logs -f --tail 100 chaosblade-operator-6cc8d5484-ckj7s
...
time="2020-08-12T12:57:29Z" level=error msg="Invoke exec command error" command="/opt/chaosblade/blade destroy e0938f35b77d523c" container=chaosblade-tool error="command terminated with exit code 1" podName=chaosblade-tool-t6kjg podNamespace=chaosblade
time="2020-08-12T12:57:29Z" level=info msg="err: {\"code\":604,\"success\":false,\"error\":\"sh: invalid number '454410380578/opt/chaosblade/bin/chaos_burncpu'\\n exit status 1 exit status 1\"}; out: " command="/opt/chaosblade/blade destroy e0938f35b77d523c" container=chaosblade-tool podName=chaosblade-tool-t6kjg podNamespace=chaosblade
time="2020-08-12T12:57:29Z" level=error msg="finalize chaosblade failed at destroying phase" Request.Name=cpu-load error="failed to destory, please see the experiment status"
...

Tell us your environment

kubernetes环境:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.13", GitCommit:"39a145ca3413079bcb9c80846488786fed5fe1cb", GitTreeState:"clean", BuildDate:"2020-07-15T16:18:19Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.13", GitCommit:"39a145ca3413079bcb9c80846488786fed5fe1cb", GitTreeState:"clean", BuildDate:"2020-07-15T16:10:14Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

docker环境:

$ docker info
Containers: 23
 Running: 14
 Paused: 0
 Stopped: 9
Images: 23
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 188.5GiB
Name: xxx-xxx-xxx-xxx
ID: Q3QP:N2RJ:DE4D:XCDZ:5NQ6:IZND:CTGN:TJJZ:WMO4:GTJ4:O7X6:XFZW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://xxxxxxx.mirror.aliyuncs.com/
Live Restore Enabled: false
Product License: Community Engine

chaosblade版本:0.6.0

Anything else we need to know?

Mountaincnc commented 4 years ago

通过release提供的命令

$ blades=($(kubectl get blade | grep -v NAME | awk '{print $1}' | tr '\n' ' ')) && kubectl patch blade $blades --type merge -p '{"metadata":{"finalizers":[]}}'

删除了资源 但是加入metadata.finalizers字段后再删除资源,在node上运行的实验不会停止

Mountaincnc commented 4 years ago

从另一个集群测试,没发现这个问题。应该是集群环境导致的,我再排查排查

manutdym commented 3 years ago

您好,我遇到了相同的问题,请问你是怎么解决的呢

xcaspar commented 3 years ago

您好,我遇到了相同的问题,请问你是怎么解决的呢

Please execute the following command with --kubeconfig flag used to specify the cluster.

blades=($(kubectl get blade | grep -v NAME | awk '{print $1}' | tr '\n' ' ')) && kubectl patch blade $blades --type merge -p '{"metadata":{"finalizers":[]}}'