delete pod-network-delay rule will be failure when the pod restart

bmbbms commented 3 years ago

Issue Description

bug report

Describe what happened (or what feature you want)

when i set a network delay rule for a pod, it make pod livness probe failed,and the pod will be restarted. at this time, if i want to delete the network delay rules ,it will be failure ,because the containerId will be changed when the pod restart. actually the network delay rule continue using the origin containerId to delete the pod network delay.

Describe what you expected to happen

so the containerId is not good for the specified rules. we should theck the Identifier's containerId whether changed when delete failure

How to reproduce it (as minimally and precisely as possible)

first deply a network delay for a pod

    Status:
      Exp Statuses:
        Action:  delay
        Res Statuses:
          Id:          b42b0ee218262ce9
          Identifier:  test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce
          Kind:        pod
          State:       Success
          Success:     true
        Scope:         pod
        State:         Success
        Success:       true
        Target:        network
      Phase:           Running
    Events:            <none>

make sure the delay can result in the pod live probe failed and restart

test-testing-dc-k2030         reliable-msg-route-5fdc8cc757-hwvdt               1/1     Running            4          3d      192.168.137.81    172.20.35.51   <none>           <none>

delete the rule

Status: Exp Statuses: Action: delay Error: see resStatus for the error details Res Statuses: Error: Error response from daemon: No such container: 18f0b9d032ce Id: b42b0ee218262ce9 Identifier: test-testing-dc-k2030/172.20.35.51/reliable-msg-route-5fdc8cc757-hwvdt/reliable-msg-route/18f0b9d032ce Kind: pod State: Error Success: false Scope: pod State: Success Success: false Target: network Phase: Destroying


4. if i delete the rule force,actually the delay rules still in the pod

### Tell us your environment
k8s v1.16.15
chaosblade-operator-v0.9.0

### Anything else we need to know?

xcaspar commented 3 years ago

You can set --daemonset-enable=false flag to close sidecar model when deploying chaosblade-operator to solve the problem.

bmbbms commented 3 years ago

i see the default value of this parm is false.

xcaspar commented 3 years ago

You can delete the pod to recover it. I will solve this problem later.

bmbbms commented 3 years ago

actually it will work well when i apply the rule again using --force ,and i will success delete the rule before the pod next restarting . but i think it not a perfect way for doing that,so i report the bug.

yzhang559 commented 3 years ago

@xcaspar I am using chaosblade-operator-v1.3.0 and k8s v1.21.4, still faced with this issue. Would there be any fix on next release or is there any work around to bypass this issue. Thanks.

chaosblade-io / chaosblade