aws / aws-network-policy-agent

Apache License 2.0
43 stars 27 forks source link

Pods with no eBPF maps attached #183

Closed alemuro closed 6 months ago

alemuro commented 8 months ago

What happened:

Sometimes, when starting new pods they are not reachable by other pods. After some debugging I realised that:

Our network policies are composed by:

If we take a look to the PolicyEndpoint resources, they look fine. Seems a problem between the controller and eBPF.

Attach logs

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

It is random in our setup, we haven't figured it out yet how to reproduce it.

Anything else we need to know?:

Environment:

jdn5126 commented 8 months ago

@alemuro is the problem persistent, i.e. the eBPF program never gets attached? We do have one known issue that was just fixed by https://github.com/aws/aws-network-policy-agent/pull/179. The short story is that if there are multiple replicas of the same pod on a node, there is a race condition where when one replica is deleted, the eBPF program for the other replica can also be deleted.

If this is a staging environment, you can try the v1.0.8-rc1 release candidate image that we just built. The official v1.0.8 image will be released in the coming weeks.

alemuro commented 8 months ago

@alemuro is the problem persistent, i.e. the eBPF program never gets attached?

It is never attached. The only way of fixing it is by removing and let K8S create a new pod.

Will try the v1.0.8-rc1 version, and I will give you some feedback!

Many thanks

jdn5126 commented 8 months ago

Got it. If v1.0.8-rc1 does not resolve the issue, you can send an email with the network policy agent logs to k8s-awscni-triage@amazon.com, and we can dig further. Before sending the logs, enable network policy event logs (https://github.com/aws/aws-network-policy-agent?tab=readme-ov-file#enable-policy-event-logs) so the policy decisions can be logged as well

jayanthvn commented 8 months ago

@alemuro - I reviewed your logs and pin paths are getting cleaned up. #179 will likely fix your issue. Do let us know once you are v1.0.8-rc1. Thanks!!

alemuro commented 8 months ago

Hello, we've been testing this for the whole day and seems it is fixed.

jayanthvn commented 6 months ago

v1.0.8 release is available - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3