Closed alemuro closed 6 months ago
@alemuro is the problem persistent, i.e. the eBPF program never gets attached? We do have one known issue that was just fixed by https://github.com/aws/aws-network-policy-agent/pull/179. The short story is that if there are multiple replicas of the same pod on a node, there is a race condition where when one replica is deleted, the eBPF program for the other replica can also be deleted.
If this is a staging environment, you can try the v1.0.8-rc1
release candidate image that we just built. The official v1.0.8
image will be released in the coming weeks.
@alemuro is the problem persistent, i.e. the eBPF program never gets attached?
It is never attached. The only way of fixing it is by removing and let K8S create a new pod.
Will try the v1.0.8-rc1
version, and I will give you some feedback!
Many thanks
Got it. If v1.0.8-rc1
does not resolve the issue, you can send an email with the network policy agent logs to k8s-awscni-triage@amazon.com
, and we can dig further. Before sending the logs, enable network policy event logs (https://github.com/aws/aws-network-policy-agent?tab=readme-ov-file#enable-policy-event-logs) so the policy decisions can be logged as well
@alemuro - I reviewed your logs and pin paths are getting cleaned up. #179 will likely fix your issue. Do let us know once you are v1.0.8-rc1
. Thanks!!
Hello, we've been testing this for the whole day and seems it is fixed.
v1.0.8 release is available - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3
What happened:
Sometimes, when starting new pods they are not reachable by other pods. After some debugging I realised that:
/opt/cni/bin/aws-eks-na-cli ebpf loaded-ebpfdata | grep Pod
, but they do have a mapping when everything works fine.grep "Target Pod doesn't belong to the current pod Identifier:" network-policy-agent.log | sed -e "s/.*Pod ID\: //" | awk -F "\"" '{print $3}' | sort -n | uniq
returns the list of all pods that are hosted in the current instance and are not reachable from other pods (because they don't have a map).Our network policies are composed by:
deny
all ingress traffic by defaultallow
all egress traffic going to internet EXCEPT for an specific IP. <-- This is not filtered on the affected pods!If we take a look to the
PolicyEndpoint
resources, they look fine. Seems a problem between the controller and eBPF.Attach logs
What you expected to happen:
except
parameter.How to reproduce it (as minimally and precisely as possible):
It is random in our setup, we haven't figured it out yet how to reproduce it.
Anything else we need to know?:
Environment:
kubectl version
): v1.27cat /etc/os-release
): Amazon Linux v2uname -a
): 5.10.201-191.748.amzn2.x86_64