Open kervrosales opened 1 month ago
@kervrosales Are you saying deleting and recreating the NP solved it? or are you trying to restart pods as well? Pods will be in default allow mode until the network policies are reconciled against them and this can take up to 2-3s based on the cluster load. Please check if this situation applies to you and if it does please try with Strict mode and let us know if it helps..
@achevuru Thanks for your reply!
That is correct; deleting and recreating the NetworkPolicy resolved the issue. I had to do this for each namespace that requires a NetworkPolicy. However, restarting the pods did not resolve the issue, which leads me to believe that the problem I am experiencing might be different from what was previously mentioned.
Can you please confirm that enabling "strict" mode on AWS CNI is achieved by updating the DaemonSet in the kube-system namespace? I have done this, but for some reason, it does not seem to work. None of my new pods get the default-deny policy by default.
I confirm that the container has the correct environment variables:
containers:
- env:
- name: NETWORK_POLICY_ENFORCING_MODE
value: strict
- name: ADDITIONAL_ENI_TAGS
value: '{}'
- name: ANNOTATE_POD_IP
value: "false"
- name: AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER
value: "false"
- name: AWS_VPC_CNI_NODE_PORT_SUPPORT
value: "true"
- name: AWS_VPC_ENI_MTU
Thanks!
This is expected, albeit I do not know why the AWS implementation
[...] The Amazon VPC CNI plugin for Kubernetes configures network policies for pods in parallel with the pod provisioning. Until all of the policies are configured for the new pod, containers in the new pod will start with a default allow policy. [...] (https://docs.aws.amazon.com/eks/latest/userguide/cni-network-policy.html#cni-network-policy-considerations)
deviates from the official K8s spec for network policies:
All newly created pods affected by a given NetworkPolicy will be isolated before they are started. Implementations of NetworkPolicy must ensure that filtering is effective throughout the Pod lifecycle, even from the very first instant that any container in that Pod is started. (https://kubernetes.io/docs/concepts/services-networking/network-policies/#pod-lifecycle)
This choice results in severe security implications, mainly that pods might not be isolated for the first few seconds they are started, which would give a malicious actor the opportunity to perform network actions in this short time frame.
@achevuru, imo this should be addressed immediately, as a breach in network allow rules during pod startup poses a high security risk, especially for workloads that cannot be scanned for security issues.
@samox73 Above doc calls out a Strict
mode option under the same section, that starts with either default deny or the current set of policies configured against the pod.
Also, in the Standard
mode - any network connections opened prior to Network Policy reconciliation (in the first 1-2 seconds) will be terminated post the enforcement.
@achevuru This is not exactly the desired behavior either though. This blocks all traffic by default. Thus, one would have to write policies for all workloads that run in the cluster and need either ingress or egress connections, which can pose a time consuming task that takes multiple days for big clusters.
It would be very cool, if pods that are associated with a network policy have traffic blocked from the start and pods that match no network policy to have default allow.
@samox73 So, I'll summarize the options available right now,
Standard
Mode --> Pod creation and Network Policy reconciliation are parallel workflows with Network Policy reconciliation kicking in after the Pod is assigned an IP address. Reconciliation can't start until an IP is assigned to a Pod and the pod Interface is UP and running. Downside of this is that it can (potentially) leave the pod's interface in Default Allow mode for the first few seconds (usually 1-3 seconds) of the pod's lifecycle. However, as I called out above any unexpected (ongoing) connection will be blocked once NP comes in to effect.
Strict
Mode --> Pod Creation is blocked until either a Default Deny (or) the current active set of policies are configured against the pod. Usually the initial pod of a deployment/daemonset etc., will come up with default deny and subsequent replicas of the same deployment will come up with current set of active firewall rules against the pods. Will ensure no unauthorized network access is allowed from any pod on the cluster until the reconciliation is complete against the newly launched pod. As you called out, this mode requires an user to have a Network Policy defined for all the pods in the cluster and we landed at this requirement mainly from the customer feedback we received. I understand your concern, that it takes some work to create an NP for every pod in the cluster but you can just create a simple allow all Network Policy per namespace instead and pods in a particular namespace with no specific Network policy will inherit this allow all policy. So, one per namespace would suffice. Downside of this approach is that some pods can (potentially) stay in default deny mode until NP reconciliation is complete against the newly launched pod (usually 1-3 seconds). We're considering providing an option to user, where they can exclude some namespaces from Strict
mode enforcement.
Also, If you want to reduce the initial time period (1-3 seconds based on the load) during which a pod is either in default allow (or) default deny mode(based on the NP enforcement mode selected), you can use the ANNOTATE_POD_IP feature of VPC CNI. We introduced this feature precisely for this scenario but for Calico Network policy solution and we extended this to VPC CNI's Network Policy implementation as well. With this feature, it should reduce the initial 1-3s period down to less than a second (or even lower in most cases).
I am experiencing an issue with network policies not being enforced upon their initial creation in my Kubernetes cluster using the AWS Network Policy Agent. The policies only take effect after being deleted and re-created. Below are the details of my configuration.
Kubernetes YAML resources
What happened: When the network policy is initially created, it does not enforce the ingress rules as expected. I am still able to access the demo-app service from the client-one pod. However, after deleting and re-creating the network policy, the policy is enforced correctly, and access is denied as expected.
Attach logs Initial Network Policy Creation logs from Network Policy agent: not-working-infra.json
Delete of Network Policy logs: delete-log-infra.json
Recreation of Network policy logs: working-infra.json
What you expected to happen: The network policy should enforce the ingress rules upon initial creation without requiring deletion and re-creation.
How to reproduce it (as minimally and precisely as possible):
Deploy the Kubernetes resources:
Test connectivity:
Delete and re-create the network policy:
Test connectivity again:
Anything else we need to know?: I have verified the CNI plugin configuration and ensured that it supports network policies. This issue seems to be related to the timing or synchronization of the policy application.
Environment: EKS Version: 1.27 CNI Plugin: v1.18.1-eksbuild.1