chaos-mesh / chaos-tproxy

33 stars 15 forks source link

After the HttpChaos experiment, the container is still in a disconnected state #61

Open yaimready opened 1 year ago

yaimready commented 1 year ago

Chaos-tproxy controller does not properly handle the logic to end the HttpChaos Experiment, and the container remains in a disconnected state after the experiment is completed

Expected Behavior

After the experiment is over, the container network should return to normal

Current Behavior

After the HttpChaos experiment, the container is still in a disconnected state.

Steps to Reproduce

  1. setup chaos mesh environment on aws eks ( basic setup )

  2. deploy nginx in namespace chaos-test

$ kubectl -n chaos-test get deploy
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   3/3     3            3           2d19h
  1. expose and test nginx
$ kubectl -n chaos-test port-forward deploy/nginx 8000:80
Forwarding from 127.0.0.1:8000 -> 80
Forwarding from [::1]:8000 -> 80
Handling connection for 8000

(hold this bash session and open new bash session...)

$ curl -v 127.0.0.1:8000 >/dev/null
* Rebuilt URL to: 127.0.0.1:8000/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.23.3
< Date: Mon, 20 Feb 2023 06:15:27 GMT
< Content-Type: text/html
< Content-Length: 42
< Last-Modified: Fri, 17 Feb 2023 10:27:40 GMT
< Connection: keep-alive
< ETag: "63ef569c-2a"
< Accept-Ranges: bytes
<
{ [42 bytes data]
100    42  100    42    0     0    258      0 --:--:-- --:--:-- --:--:--   259
* Connection #0 to host 127.0.0.1 left intact
  1. create HttpChaos for nginx
$ cat chaos.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: HTTPChaos
metadata:
  name: test-http-chaos
  namespace: chaos-test
spec:
  mode: all
  selector:
    labelSelectors:
      app: nginx
  target: Request
  port: 80
  method: GET
  path: /api
  abort: true
  duration: 1s

$ kubectl apply -f chaos.yaml
  1. wait 1 minute , delete HttpChaos , and wait 1 minute again

$ kubectl delete -f chaos.yaml
  1. test exposed nginx again, but connection hangs
$ kubectl -n chaos-test port-forward deploy/nginx 8000:80

(exposed in step 3)

$ curl -v 127.0.0.1:8000

(wait minutes, got http 502 from kubernetes apiserver)

Context (Environment)

Kubernetes: AWS EKS with kubernetes version v1.24.8-eks-ffeb93d Chaos Mesh: chaos-mesh v2.5.1 Chaos Daemon: chaos-daemon v2.5.1

Detailed Description

I entered nginx container after experiment , I found that the arp table is not normal, so, I use "arp -s gateway eth0" recover arp table, nginx container is back to normal.

I known Chaos-tproxy controller is the executor of "change arp table" and "restore arp table", so, This maybe a Chaso-tproxy's bug.

regardfs commented 1 year ago

some issue here!

wujunwei commented 1 year ago

some issue !

fuxp3 commented 1 year ago

I encountered the same problem.