aws / aws-network-policy-agent

Apache License 2.0
42 stars 25 forks source link

Network policy blocks established connections to RDS #236

Open Mohilpalav opened 4 months ago

Mohilpalav commented 4 months ago

What happened:

We have a workload running in an EKS cluster which makes a request to an RDS cluster on startup. This request is blocked by the Network policy despite having an egress rule to the RDS cluster subnet from that workload. We suspect that the outbound connection goes through before the network policy node agent starts tracking the connections, and when the response is received the node agent doesn't have the known allowed connection to match due to which the traffic gets denied.

This is what we can see in the network policy flow logs:

Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45182;PROTOCOL: TCP;PolicyVerdict: DENY
Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45174;PROTOCOL: TCP;PolicyVerdict: DENY

10.47.53.151:5432-> RDS 10.27.36.181 -> EKS workload

Unfortunately, the node agent logs only show this at the moment https://github.com/aws/aws-network-policy-agent/issues/103:

2024-03-19 21:31:19.049604118 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.858783024 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.923276681 +0000 UTC Logger.check error: failed to get caller

What you expected to happen: The connection to RDS should be allowed.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: Similar issues: https://github.com/aws/aws-network-policy-agent/issues/73 https://github.com/aws/aws-network-policy-agent/issues/186

Environment:

jayanthvn commented 2 months ago

Here the pod attempted to start a connection before NP enforcement and hence response packet is dropped. Pl refer to this https://github.com/aws/aws-network-policy-agent/issues/189#issuecomment-1907586763 for detailed explanation.

Our recommended solution for this is Strict mode, which will gate pod launch until policies are configured against the newly launched pod - https://github.com/aws/amazon-vpc-cni-k8s?tab=readme-ov-file#network_policy_enforcing_mode-v1171

Other option if you don't want to enable this mode is to allow Service CIDRs given that your pods communicate via Service vips and this will allow return traffic..

achevuru commented 2 months ago

@Mohilpalav Did Strict mode help with your use case/issue?

FabrizioCafolla commented 1 month ago

@Mohilpalav Is there any solution for this issue?

Monska85 commented 1 month ago

Hello there,

we have the same problem, connecting to RDS service from a pod, but also when contacting the S3 service. We try to reproduce the error, but it is not something predictable. We have some errors when we try to deploy a lot of pods at the same time that try to connect to the RDS or S3 service, but it is not always the case.

Did you find any solution to this problem?

Monska85 commented 3 days ago

Hello there,

we found a workaround here.

Using the ANNOTATE_POD_IP environment variable speeds up the process of discovering pod IP and, at the moment, the pod startup issues are no longer present.