Response traffic from allowed egress denied on short lived pods

luk2038649 commented 6 months ago

What happened: Recently switched from using calico tigera operator for networkpolicies, to enabling networkpolicy handling by aws-vpc-cni.

We are finding intermittent errors with connections hanging for applications which are short lived, and reach out to external services like databases immediately. Noted primarily in cronjobs and airflow pods.

We experienced this same issue reaching out to external google services, and also AWS Aurora instances in a paired VPC.

Our networkpolicy is setup with an explicit Egress allow all. And a more restrictive ingress policy.

spec:
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
  ingress:
    - from:
        - podSelector: {}
    - from:
        - namespaceSelector:
            matchLabels:
              toolkit.fluxcd.io/owner: redacted
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: redacted

Picture of policy: Screenshot 2024-01-23 at 2 10 36 PM

Its my understanding that we should be allowed to receive response traffic from anywhere based on the documentation here

Example Pod Logs: Requests that are normally completed quickly hang and never finish.

$ kubectl -n redactedNS logs redactedTask
INFO       2024-01-23 16:44:26 redacted.db read_from_views                      298 : Querying view redacted_data for 2024-01-23 16:57:04

Note that if you exec into this pod and run the same command some minutes startup, it will complete. It only fails to complete right after startup.

Attach logs VPC-CNI logs. Instance of return traffic from a google service being denied

"level":"info","ts":"2024-01-23T15:49:26.013Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.253.63.95","Src Port":443,"Dest IP":"x.x.x.x(peered vpc IP)","Dest Port":59486,"Proto":"TCP","Verdict":"DENY"}

instance of return traffic from an aurora instance in a peered VPC being denied.

{"level":"info","ts":"2024-01-23T17:50:34.327Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP),"Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-01-23T17:52:37.207Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP)","Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-01-23T17:54:40.087Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP),"Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}

What you expected to happen: Response traffic should not be denied if egress was allowed.

How to reproduce it (as minimally and precisely as possible):

enable all egress in networkpolicy
disable ingress in networkpolicy
create a cronjob which immediately reaches out to an external service.

Anything else we need to know?: We did not experience this same situation when using calico tigera operator to handle the same networkpolicy.

to be clear, calico has been completely removed and all nodes have been restarted.

Seems to be possibly the same as https://github.com/aws/aws-network-policy-agent/issues/83

We have found workarounds by doing two main things.

Explicitly allowing ingress from the CIDR block of the peered VPC where the DB lives.
Sleeping jobs/pods for 5s before making connections.

Environment:

Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.12-eks-5e0fdde", GitCommit:"95c835ee1111774fe5e8b327187034d8136720a0", GitTreeState:"clean", BuildDate:"2024-01-02T20:34:50Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"}
CNI Version: v1.16.0-eksbuild.1
Network Policy Agent Version: aws-network-policy-agent:v1.0.7-eksbuild.1
OS (e.g: cat /etc/os-release): NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/" SUPPORT_END="2025-06-30"
Kernel (e.g. uname -a): Linux ip-x-x-x-x.ec2.internal 5.10.201-191.748.amzn2.x86_64 #1 SMP Mon Nov 27 18:28:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

brettstewart commented 6 months ago

+1

achevuru commented 6 months ago

@luk2038649 Based on the issue description, it is expected behavior in a way. Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod.

We track reverse flows (response traffic) via our own internal conntrack. In the above example, when we initiate a connection to AWS Aurora, the traffic should be allowed (as egress is configured to be allow-all) and once the probe allows the traffic it creates a conntrack entry which the ingress probe will then rely on to allow the return traffic. However, I think the egress connection in this case happens right after pod startup and before the policies are enforced (i.e.,) before the relevant eBPF probes are attached and so the required conntrack entry is not created. But the probes are attached with relevant rules before the return traffic arrives at the pod - so we will not have a match in the conntrack entry and the ingress rules in the configured policy do not allow traffic from this endpoint resulting in a drop. Explains why introducing a few seconds delay resolved the issue (explicitly adding the rules under ingress section will also work in the above race condition). To get around this we might need a few seconds delay at pod startup before it initiates a connection (or) a retry for a failed connection should help as well.

We plan to introduce a Strict mode option in the near future which will gate pod launch until either relevant policies are configured against a new pod replica (or) block all ingress/egress connections until the policies are reconciled against a new pod.

luk2038649 commented 6 months ago

@achevuru Thanks for the quick response!

Is there any approximate timeline for the general release of the "strict mode" option? Is there an issue or PR we can track?

explicitly adding the rules under ingress section will also work in the above race condition

This is what we have done for known hosts like databases, but we have a lot of applications and opening up traffic for all possible responses is not ideal

ariary commented 6 months ago

Hi all! We've noticed same behaviour for pods to pods traffic: an allowed traffic connection (by netpol) is denied by netpol at startup but then allowed (Once the newly launched pods are reconciled against configured network policies on the cluster)

Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod.

@achevuru I think this is the opposite. All Traffic is denied from a newly created pod until the newly launched pods are reconciled against configured network policies on the cluster

What I was describing is in fact:

Allow rules will be applied eventually after the isolation rules (or may be applied at the same time). In the worst case, a newly created pod may have no network connectivity at all when it is first started, if isolation rules were already applied, but no allow rules were applied yet.

cf netpol doc/pod lifecycle

achevuru commented 6 months ago

@luk2038649 We're targeting it for early Q2/late Q1 release time frame. Will update once we're closer to the release.

allamand commented 5 months ago

Can it be possible to delay the readiness of the pod until all the netpol have been correctly applied ? Something like the podreadinessgate used with load balancer integration ?

ariary commented 4 months ago

@achevuru The Strict mode option has not solved the issue as it seems that what the issue is describing is especially the standard option of the Strict Mode which is (still) blocking some traffic.

Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod

This statement is not true therefore

achevuru commented 4 months ago

@ariary Can you expand on what was not solved with Strict mode? What exactly did you try with Strict mode?

Regarding Standard mode, the above statement is true (i.e.,) the pods will not have any firewall rules enforced until the new pod is reconciled against active policies and so all traffic is allowed. However, once the firewall rules take effect, it will block any return traffic that isn't tracked by the probes. Please refer here. Strict mode should address this.

ariary commented 4 months ago

@achevuru My issue is more related to other issues, you can ignore my comment

aws / aws-network-policy-agent

Response traffic from allowed egress denied on short lived pods #189