aws / aws-network-policy-agent

Apache License 2.0
42 stars 23 forks source link

Flaky network policy enforcement, especially around the kubeapi #248

Closed corang closed 1 month ago

corang commented 2 months ago

What happened: In an EKS cluster with many default deny network policies and specific allowed IPs/NSs vpc-cni/aws-network-policy-agent seems to "forget" about some network policies for some pods. This seems to mostly happen for the kubeapi.

What you expected to happen: Network Policy enforcement is consistent and reliable.

How to reproduce it (as minimally and precisely as possible): Deploy resources to the cluster in many namespaces that require access to the kubeapi. In each namespace create a default deny network policy and an allow kubeapi policy. Eventually pods will not be able to talk to the kubeapi (ssl connect timeout). This can take anywhere from hours to weeks.

Anything else we need to know?:

Environment:

kervrosales commented 2 months ago

Are there any updates on this ticket? I am also encountering this issue, experiencing sporadic denied traffic in the network agent log for Kubernetes API and CoreDNS, although initial connections appear to work when tested manually. I'm unable to implement network policies on our production cluster while this issue remains unresolved. Thank you!

jayanthvn commented 1 month ago

This looks similar to https://github.com/aws/aws-network-policy-agent/issues/204. We have merged the fix to release branch and currently going through the release pipeline. We should have the release by this week.

jayanthvn commented 1 month ago

Fix is released with network policy agent v1.1.2. - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.