aws / aws-network-policy-agent

Apache License 2.0
42 stars 24 forks source link

Intermittent connection reset and delay running time #245

Open younsl opened 3 months ago

younsl commented 3 months ago

What happened:

Background

After migrating the network policy provider from Calico v3.25.1 and Tigera Operator to VPC CNI v1.18.0-eksbuild.1, the following two network policy issues occurred on EKS v1.26 cluster:

Note: The two network issues mentioned in this issue only occur in pods that have applied NetworkPolicy via VPC CNI.

Cluster environment

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.0-eksbuild.1
amazon-k8s-cni:v1.18.0-eksbuild.1
amazon/aws-network-policy-agent:v1.1.0-eksbuild.1

Network policy issues

1. Intermittent connection reset by peer

occurs during Pod to Pod or Pod to EC2 communication.

Similar issues: https://github.com/aws/aws-network-policy-agent/issues/204, https://github.com/aws/aws-network-policy-agent/issues/210, https://github.com/aws/aws-network-policy-agent/issues/236

2. Delayed Running time

Delay in the time it takes for the pod to run.

image

For pods to which Network Policy is applied, the time it takes to activate the Readiness Probe is up to 3 times slower.

Category 1st time 2nd time
egress netpol applied 86s 91s
egress netpol not applied 30s 32s

Similar issues: https://github.com/aws/aws-network-policy-agent/issues/189, https://github.com/aws/aws-network-policy-agent/issues/186

Attach logs

1. Intermittent connection reset by peer

[tcpdump] From workload pod to EC2 instance

$ tcpdump -i any '(dst host xx.xxx.xxx.247 and dst port 8080) or (tcp[tcpflags] & tcp-rst != 0)'
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

07:40:05.607233 eth0  Out IP <REDACTED>-5bc847767c-56l5j.47664 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 4216205005, win 62727, options [mss 8961,sackOK,TS val 896125324 ecr 0,nop,wscale 7], length 0
07:40:05.607607 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.47664: Flags [R.], seq 0, ack 4216205006, win 0, length 0

... repeated ...

07:42:25.807039 eth0  Out IP <REDACTED>-5bc847767c-56l5j.39080 > ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080: Flags [S], seq 1628133200, win 62727, options [mss 8961,sackOK,TS val 896265524 ecr 0,nop,wscale 7], length 0
07:42:25.807463 eth0  In  IP ip-xx-xxx-xxx-247.ap-northeast-2.compute.internal.8080 > <REDACTED>-5bc847767c-56l5j.39080: Flags [R.], seq 0, ack 1628133201, win 0, length 0

Intermittently, the workload pod receives a resetRST packet response from EC2.

[kubectl sniff] From workload pod to EC2 instance

image
# Source IP started with 221 is workload pod
# Destination IP ended with 247 is EC2 instance.
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - port 8080]' on container: '<REDACTED>', pod: '<REDACTED>-5bc847767c-56l5j', namespace: 'b<REDACTED>'
    1   0.000000 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 53456 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897677821 TSecr=0 WS=128
    2   0.000433 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 53456 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    3  70.178815 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 33610 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897748000 TSecr=0 WS=128
    4  70.179236 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 33610 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    5 140.194787 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 51794 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897818016 TSecr=0 WS=128
    6 140.195225 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 51794 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    7 210.383682 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 32906 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897888204 TSecr=0 WS=128
    8 210.384104 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 32906 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
    9 280.399946 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 55182 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=897958221 TSecr=0 WS=128
   10 280.400365 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 55182 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   11 351.055956 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 42620 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898028877 TSecr=0 WS=128
   12 351.056403 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 42620 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   13 421.076549 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 49318 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898098897 TSecr=0 WS=128
   14 421.076943 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 49318 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   15 495.679313 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 45716 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898173500 TSecr=0 WS=128
   16 495.679744 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 45716 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
   17 565.699845 xx.xxx.xxx.221 → yy.yyy.yyy.247 TCP 76 46792 → 8080 [SYN] Seq=0 Win=62727 Len=0 MSS=8961 SACK_PERM TSval=898243521 TSecr=0 WS=128
   18 565.700281 yy.yyy.yyy.247 → xx.xxx.xxx.221 TCP 56 8080 → 46792 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

intermittently, the workload pod receives a resetRST packet response from EC2.

If a issue occurs in the workload pod, the slack notification below is output.

image

2. Delayed Running time

Captured ebpf-sdk log on worker node immediately after pod restart

$ tail -f /var/log/aws-routed-eni/ebpf-sdk.log | grep 'xx.xxx.xx.117'
{"level":"info","ts":"2024-04-11T02:25:29.435Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.437Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.584Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.586Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.750Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.752Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.888Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:29.890Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.012Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.014Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.073Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:30.074Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.596Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.598Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.665Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.667Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.831Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.832Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.929Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:48.931Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.009Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.010Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.182Z","caller":"ebpf/bpf_client.go:724","msg":"Converted string to bytearray [32 0 0 0 xx xxx xx 117]"}
{"level":"info","ts":"2024-04-11T02:25:49.185Z","caller":"maps/loader.go:491","msg":"Adding to key list [32 0 0 0 xx xxx xx 117]"}

A Deny log occurs from the destination Service (Cluster IP) IP 172.20.67.165.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xxx.198'
{"level":"info","ts":"2024-04-11T02:18:47.938Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:49.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":45868,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:18:51.394Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:58.054Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:18:59.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":41568,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49904,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:09.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":49908,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:10.642Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.758Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:10.926Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.011Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.250Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.358Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xxx.198/32"}
{"level":"info","ts":"2024-04-11T02:19:11.362Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-04-11T02:19:19.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":56462,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:29.430Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":42600,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.242Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.20.67.165","Src Port":80,"Dest IP":"xx.xxx.xxx.198","Dest Port":49032,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.247Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":42681,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":47922,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:38.249Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.198","Src Port":57360,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37192,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-11T02:19:39.431Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xxx.179","Src Port":37196,"Dest IP":"xx.xxx.xxx.198","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The same network issue occurred in all VPC CNI v1.16.0, v1.16.1, and v1.18.0 versions.

Environment:

jayanthvn commented 3 months ago

We will look into this and get back. Btw is this easily reproducible?

younsl commented 3 months ago

@jayanthvn Yes, reproducible. But intermittent, it may take time for it to recur.

jayanthvn commented 3 months ago

Thanks. We will review the logs and get back to you.

younsl commented 3 months ago

Our team found workarounds for two network policy issues through internal testing today.

Maybe it is not a fundamental solution.

Note: Sensitive information such as IP address and application name has been REDACT<REDACTED>.

TLDR

Symptom 1. Intermittent connection reset by peer is resolved.

Symptom 2. Delayed readiness time is resolved.

Workarounds

1. Intermittent connection reset by peer

1-1. Workaround

As mentioned in Amazon EKS official documentation, intermitten connection reset does not occur when bpffs is mounted on an EC2 Worker Node.

1-2. Record for issue resolution

Check the Kernel version and AMI version of the worker node.

The workload pod(source) that was experiencing the intermitten connection reset symptom was scheduled on the worker node.

$ kubectl get node -o wide ip-xx-xxx-xx-98.ap-northeast-2.compute.internal
NAME                                              STATUS   ROLES    AGE   VERSION                INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-xx-xxx-xx-98.ap-northeast-2.compute.internal   Ready    <none>   69d   v1.26.12-eks-5e0fdde   xx.xxx.xx.98   <none>        Amazon Linux 2   5.10.205-195.804.amzn2.x86_64   containerd://1.7.2

 

Connect to the worker node and manually mount bpffs:

# Mount bpf filesystem in worker node
sudo mount -t bpf bpffs /sys/fs/bpf
$ mount -l | grep bpf
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)
none on /sys/fs/bpf type bpf (rw,relatime)

Question: Even though prerequisites are met, why do I have to mount the BPF filesystembpffs on the worker node to resolve the symptom?

Prerequisite for network policy of VPC CNI For all other cluster versions, if you upgrade the Amazon EKS optimized Amazon Linux to version v20230703 or later or you upgrade the Bottlerocket AMI to version v1.0.2 or later, you can skip this step.

Reference: EKS User guide

 

Since mounting bpffs before work on April 11th, the read ECONNRESET error has not occurred.

image

 

2. Delayed readiness time

2-1. Workaround

Add an Ingress netpol to the workload pod (source). This Ingress netpol explicitly allows ingress from 172.20.0.0/16, which is the Kubernetes service IP range.

image

2-2. Record for issue resolution

Network policy enforcing mode is set to standard, as the default.

$ kubectl get ds -n kube-system aws-node -o yaml
      containers:
      - env:
        - name: NETWORK_POLICY_ENFORCING_MODE
          value: standard
        name: aws-node

Create a new ingress netpol that ‘explicitly’ allows ingress from the Kubernetes Service IP range.

$ kubectl get netpol -n <REDACTED> ingress-service -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  ...
spec:
  ingress:
  - from:
    - ipBlock:
        cidr: 172.20.0.0/16
    ports:
    - endPort: 65535
      port: 1
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/networkpolicy-ingress-service: apply
  policyTypes:
  - Ingress
status: {}
$ kubectl get pod -n <REDACTED> t<REDACTED> -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/restartedAt: "2024-04-12T12:30:49+09:00"
  creationTimestamp: "2024-04-12T06:14:02Z"
  generateName: t<REDACTED>-cc878cb69-
  labels:
    ...
    app.kubernetes.io/networkpolicy-ingress-service: apply
    app.kubernetes.io/networkpolicy-ingress-t<REDACTED>: apply
    pod-template-hash: cc878cb69

To explicitly allow Kubernetes service IP range, added app.kubernetes.io/networkpolicy-ingress-service to the Pod label.

 

Delayed readiness time issue was resolved after explicitly attaching the ingress netpol to the workload pod(source) as shown in this comment.

$ tail -f /var/log/aws-routed-eni/network-policy-agent.log | egrep 'xx.xxx.xx.159'
{"level":"info","ts":"2024-04-12T06:09:36.079Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":43798,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.080Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":53898,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:36.081Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":55208,"Dest IP":"172.20.67.165","Dest Port":80,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":56070,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:09:41.037Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"xx.xxx.xx.159","Src Port":46937,"Dest IP":"172.20.0.10","Dest Port":53,"Proto":"UDP","Verdict":"ACCEPT"}

... No deny logs from kubernetes service IP starting with 172.20.67.165 ...

{"level":"info","ts":"2024-04-12T06:10:03.016Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.294Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.480Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.528Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.678Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:03.716Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:680","msg":"Updating Map with ","IP Key:":"xx.xxx.xx.159/32"}
{"level":"info","ts":"2024-04-12T06:10:11.661Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.34.98","Src Port":38850,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.554Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.10.163","Src Port":19310,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:13.606Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.11.99","Src Port":22244,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}
{"level":"info","ts":"2024-04-12T06:10:14.004Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"ss.sss.29.161","Src Port":52804,"Dest IP":"xx.xxx.xx.159","Dest Port":3000,"Proto":"TCP","Verdict":"ACCEPT"}

 

Captured conntrack list between workload pod(source, ends with 35.242) and cluster IP for destination pod(172.20.67.165)

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 118 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 431998 ESTABLISHED src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

I observed with conntrack -L command that connection ESTABLISHED rapidly between workload pod(source) and destination pod(service IP).

# Run on the worker node where the source pod is scheduled
$ conntrack -L --src ss.sss.35.242 --dst 172.20.67.165
tcp      6 98 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55938 dport=80 src=ss.sss.19.208 dst=ss.sss.35.242 sport=8080 dport=55938 [ASSURED] mark=0 use=1
tcp      6 102 TIME_WAIT src=ss.sss.35.242 dst=172.20.67.165 sport=55944 dport=80 src=ss.sss.21.1 dst=ss.sss.35.242 sport=8080 dport=55944 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 2 flow entries have been shown.

 

It was observed that the ready time of the pod was dramatically reduced from 92 seconds to 32 seconds.

$ kubectl get pod -n <REDACTED> -l app.kubernetes.io/name=t<REDACTED>
NAME                          READY   STATUS    RESTARTS   AGE
t<REDACTED>-cc878cb69-8tmt9   1/1     Running   0          33s

At this point, Readiness time for the workload(source) pod back to normal.

The readiness time comparison is:

Source Pod Egress netpol Time to readiness Status
Workload Pod egress netpol, without ingress netpol 92s Delayed
Workload Pod egress netpol and attached ingress netpol newly 32s Normal
younsl commented 3 months ago

@jayanthvn @achevuru I submitted a node-level support bundles to k8s-awscni-triage@amazon.com.

node-level support-bundle collection ```bash # Collect node level tech-support bundle in the affected worker node $ /opt/cni/bin/aws-cni-support.sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 56 100 56 0 0 34825 0 --:--:-- --:--:-- --:--:-- 56000 This is version 0.7.6. New versions can be found at https://github.com/awslabs/amazon-eks-ami/blob/master/log-collector-script/ Trying to collect common operating system logs... Trying to collect kernel logs... Trying to collect modinfo... Trying to collect mount points and volume information... Trying to collect SELinux status... Trying to collect iptables information... Trying to collect iptables-legacy information... Trying to collect installed packages... Trying to collect active system services... Trying to Collect Containerd daemon information... Trying to Collect Containerd running information... Trying to Collect Docker daemon information... Warning: The Docker daemon is not running. Trying to collect kubelet information... Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... Trying to collect Multus logs if they exist... Trying to collect sysctls information... Trying to collect networking infomation... conntrack v1.4.4 (conntrack-tools): 5437 flow entries have been shown. Trying to collect CNI configuration information... Trying to collect CNI Configuration Variables from Docker... Warning: The Docker daemon is not running. Trying to collect CNI Configuration Variables from Containerd... Trying to collect network policy ebpf loaded data... Trying to collect Docker daemon logs... Trying to Collect sandbox-image daemon information... Trying to Collect CPU Throttled Process Information... Trying to Collect IO Throttled Process Information... Trying to archive gathered information... Done... your bundled logs are located in /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz ``` ```bash $ ls -lh /var/log/eks* -rw-r--r-- 1 root root 54M Apr 12 09:44 /var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz ``` ```bash # Download tech-support bundle from node filesystem to local $ kubectl cp default/nsenter-xxxxxx:/var/log/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz $HOME/eks_i-01827ce93254660f5_2024-04-12_0944-UTC_0.7.6.tar.gz tar: Removing leading `/' from member names ``` Reference: [troubleshooting](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#collecting-node-level-tech-support-bundle-for-offline-troubleshooting)
achevuru commented 3 months ago

@younsl Thanks for sharing your findings with us.

younsl commented 2 months ago

@achevuru So, this shouldn't fix anything and considering your issue is intermittent it is probably pure chance that you haven't run in to it in the last 1 day.

[updated] Yes, you're right.

Even though the file system bpffs was mounted on all worker nodes, the Intermitten connection reset by peer symptom recurred 3 days after the workaround was applied.

younsl commented 2 months ago

Anything in progress for next release? I found updated progress https://github.com/aws/aws-network-policy-agent/issues/204#issuecomment-2085894403, https://github.com/aws/aws-network-policy-agent/pull/256.

jayanthvn commented 1 month ago

Fix is released with network policy agent v1.1.2. - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.

younsl commented 1 month ago

Hi, @jayanthvn. I'm still experiencing intermittent connection resets in network-policy-agent:v1.1.2-eksbuild.1.

Environment

Mitigation measures

Tested the two changes below to resolve the read ECONNRESET error that occurs intermittently in some pods. Observed after change from Jun 7, 17:46 (KST).

  1. Upgrade the VPC CNI addon from v1.18.1 to v1.18.2-eksbuild.1 and upgrade the network-policy-agent container version from v1.1.1 to v1.1.2 as well.
  2. Change the conntrack-cache-cleanup-period argument from the default value of 300 (5m) to 21600 (6h).

Detailed changes

VPC CNI version info:

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.2-eksbuild.1
amazon-k8s-cni:v1.18.2-eksbuild.1
amazon/aws-network-policy-agent:v1.1.2-eksbuild.1

NETWORK_POLICY_ENFORCING_MODE setting currently defaults to standard, not strict.

Diff for conntrack-cache-cleanup-period argument:

      - args:
-       - --conntrack-cache-cleanup-period=300   # 5m (default)
+       - --conntrack-cache-cleanup-period=21600 # 6h

Problem

However, even after upgrading to VPC CNI v1.18.2-eksbuild.1, which includes network-policy-agent v1.1.2, some application pods are still experiencing intermittent packet drops, such as [read ECONNRESET](err: server not responding (read ECONNRESET)) errors.

After checking internally with the developer, it was found that the retry logic for connection failures is not included in the application container affected by the network issues.

Timeline of read ECONNRESET errors occurring in some pods (starting after upgrading to network-policy-agent v1.1.2):

[!NOTE]

  • ( ) indicate the time difference from the last occurrence.
  • All times listed in the timeline below are in KST (UTC+09:00).
6/10 08:00 (6h)
6/10 02:00 (7h 39m)
6/9 18:21 (2m)
6/9 18:19 (6m)
6/9 18:13 (1h 36m)
6/9 16:37 (1h 25m)
6/9 15:12 (7h 57m)
6/9 07:15 (2h 38m)
6/9 04:37 (12h 26m)
6/8 16:11 (4h 23m)
6/8 11:48 (5h 34m)
6/8 06:14
jayanthvn commented 1 month ago

@younsl - as shared internally with the service ticket, the timeouts are not inline with conntrack cleanup since you have the cleanup every 6 hours while timeout is happening at varied times. At these times are you noticing a spike in network policy agent conntrack cache? One suspect is cache is getting full leading to certain entries getting evicted and causing timeouts..

younsl commented 3 weeks ago

@jayanthvn I'm waiting to test PR #280 on my affected clusters.

younsl commented 2 days ago

Wrap-up after meeting with EKS team

I enabled network-policy-agent log on my dev EKS v1.28 cluster.

VPC CNI yaml:

      - args:
        - --enable-ipv6=false
        - --enable-network-policy=true
        - --enable-cloudwatch-logs=false
-       - --enable-policy-event-logs=false
+       - --enable-policy-event-logs=true

So I will back to submit conntrack cleanup log for network-policy-agent ASAP.

Testing environment

$ kubectl describe daemonset aws-node -n kube-system | grep Image | cut -d "/" -f 2-3
amazon-k8s-cni-init:v1.18.2-eksbuild.1
amazon-k8s-cni:v1.18.2-eksbuild.1
aws-network-policy-agent:v1.1.1-13-gda05900-dirty