Is the network policy for VPC CNI designed to be stateful or stateless?

khayong commented 8 months ago

What happened:

I have created an egress network policy allowing the web pod to establish connections with the backend server pod at port 4000.

podSelector:
    matchLabels:
      app.kubernetes.io/component: web
egress:
    - ports:
        - protocol: TCP
          port: 4000
      to:
       - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: backend
         podSelector:
            matchLabels:
              app.kubernetes.io/name: backend

While initially operating as intended, after some time, the packet log occasionally registers a DENY entry for certain return traffic.

Node: ip-10-0-64-172.ap-southeast-1.compute.internal;SIP: 10.0.68.172;SPORT: 4000;DIP: 10.0.74.123;DPORT: 39816;PROTOCOL: TCP;PolicyVerdict: DENY

where 10.0.68.172 is the backend server, 10.0.74.123 is the web server.

To mitigate this issue, I have to define an ephemeral port range for the ingress of the returned traffic, similar to the VPC ACL configuration.

podSelector:
    matchLabels:
      app.kubernetes.io/component: web
ingress:
    - ports:
        - protocol: TCP
          port: 1024
          endPort: 65535
      from:
       - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: backend
         podSelector:
            matchLabels:
              app.kubernetes.io/name: backend

Attach logs

What you expected to happen: Kubernetes Network Policies are stateful, which means there's often no need to explicitly define rules for return traffic?

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): Server Version: v1.28.4-eks-8cb36c9
CNI Version: v1.16.0-eksbuild.1

OS (e.g: cat /etc/os-release):

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

Kernel (e.g. uname -a): Linux ip-10-0-64-172.ap-southeast-1.compute.internal 5.10.199-190.747.amzn2.x86_64 aws/amazon-vpc-cni-k8s#1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

jdn5126 commented 8 months ago

Moving to Network Policy agent repo

jdn5126 commented 8 months ago

@khayong the network policy implementation is stateless. What does the policy endpoint object show for this policy? You can see the output with kubectl get policyendpoint <policy_name>

jayanthvn commented 8 months ago

Yes you are right, there is no need to explicitly define rules for return traffic. Can you check the number of the entries in the network policy agent's conntrack table when the issue starts to happen? When the issue happens is there any pod churn or just the established connections fail after a while?

Steps to check -

SSH to the node where you are seeing deny logs, then cd /opt/cni/bin/
Dump the maps - ./aws-eks-na-cli ebpf maps
Pick the ID which has Keysize 20 Valuesize 1 MaxEntries 65536

For example here ID is 5 ->

./aws-eks-na-cli ebpf maps
Maps currently loaded : 
Type : 2 ID : 3
Keysize 4 Valuesize 98 MaxEntries 1
========================================================================================
Type : 9 ID : 5
Keysize 20 Valuesize 1 MaxEntries 65536
========================================================================================
Type : 27 ID : 6
Keysize 0 Valuesize 0 MaxEntries 262144
========================================================================================
Type : 11 ID : 16
Keysize 8 Valuesize 288 MaxEntries 65536
========================================================================================

Then using the ID, we should be able to get the number of entries using this CLI -> ./aws-eks-na-cli ebpf dump-maps 5 (Note: replace 5 with the ID you got from step 3.)

stroebs commented 8 months ago

I have also encountered this issue, and it seems to relate to long-lived connections being removed from the conntrack table prematurely. There are other issues in this repository relating to this and the latest version (CNI v1.16.0-eksbuild.1 / policy agent 1.0.7) does not fix the issue.

If you enable policy logging using the below configuration on the VPC CNI (if deployed through the UI, else use the appropriate args in Helm/CLI), you'll see that there's an ACCEPT for the connection, then sometime later it's removed from the conntrack table, followed by a DENY in your logs.

{
    "enableNetworkPolicy":"true",
    "nodeAgent": {
        "enableCloudWatchLogs": "true",
        "enablePolicyEventLogs": "true"
    }
}

jdn5126 commented 8 months ago

@khayong @stroebs it is possible this is the same error as https://github.com/aws/aws-network-policy-agent/pull/179. Do you have multiple replicas of these pods scheduled on the same node? If so, the symptoms would line up.

stroebs commented 8 months ago

Do you have multiple replicas of these pods scheduled on the same node?

I think this could very well be the case, as we bin pack on a small number of nodes to keep cost low. This would explain why we did not witness this issue in an our development environment which does not have more than 1 replica per deployment.

jdn5126 commented 8 months ago

We will have a release candidate image soon if you are willing to try it out to see if it resolves the issue. The official release image containing #179 is targeting mid-January.

jayanthvn commented 8 months ago

@khayong @stroebs - We have v1.0.8-rc1 tag available if you would like to try.

khayong commented 8 months ago

Thanks jayanthvn, it works. With v1.0.8-rc1, there's no need for me to explicitly define rules for return traffic.

khayong commented 8 months ago

I observed some denied connections in the log today. It appears that there might be a delay in creating entries in the conntrack table. The initial two logs indicate a denial status due to the conntrack not being updated? However, after a delay of 3 seconds, the third log reflects an allowance, seems like the conntrack entry has been successfully created at that time.

On the conntrack table, I can see the presence of the corresponding entry.

Is it considered normal for there to be a delay in the creation of conntrack entries?

stroebs commented 8 months ago

I observed some denied connections in the log today.

I have observed the same behaviour. This is with a single pod in a replicaset so unrelated to the race condition I think.

khayong commented 8 months ago

I have another sample here, but there is no allow log at all.

It appears that the conntrack entry was created in the incorrect direction. Should the source and destination be swapped?

jayanthvn commented 8 months ago

@khayong - There will be few seconds(1-2s) delay for the controller to reconcile and attach probes to the new pods. Traffic will be allowed until the probes are attached and then the policy enforcement will take into effect based on the config..in this case probe was probably missing when ingress traffic came in and so no conntrack entry was created.

Regarding the 2nd issue, do you have active policy on .54 pod? if yes can you share the PE?

khayong commented 8 months ago

Regarding the 2nd issue, do you have active policy on .54 pod? if yes can you share the PE?

yes, here it is

apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
metadata:
  creationTimestamp: "2024-01-11T16:17:37Z"
  generateName: live2-gateway-
  generation: 1
  name: live2-gateway-855lp
  namespace: live2
  ownerReferences:
  - apiVersion: networking.k8s.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: NetworkPolicy
    name: live2-gateway
    uid: e2fec936-f1d0-4f9a-bd8c-07d5967ba9e8
  resourceVersion: "24471548"
  uid: 2734c483-dc7b-412f-983b-6f2d2b2ca463
spec:
  egress:  
  - cidr: 0.0.0.0/0
    ports:
    - port: 53
      protocol: UDP
  - cidr: ::/0
    ports:
    - port: 53
      protocol: UDP  
  ingress:  
  - cidr: 10.0.64.172
    ports:
    - port: 8080
      protocol: TCP
    - port: 8080
      protocol: TCP
  - cidr: 10.0.78.236
    ports:
    - port: 8080
      protocol: TCP
    - port: 8080
      protocol: TCP        
  podIsolation:
  - Ingress
  - Egress
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: live2
      app.kubernetes.io/name: gateway
  podSelectorEndpoints:
  - hostIP: 10.0.54.248
    name: live2-gateway-b575dcf44-w6sfc
    namespace: live2
    podIP: 10.0.59.54
  - hostIP: 10.0.54.248
    name: live2-gateway-b575dcf44-ktrzz
    namespace: live2
    podIP: 10.0.60.190
  policyRef:
    name: live2-gateway
    namespace: live2

jayanthvn commented 7 months ago

@khayong we are unable to repro this. Can we get on a call? Are you on Kubernetes channel is so we can connect in #aws-vpc-cni .

jayanthvn commented 6 months ago

Can you please try with the latest v1.0.8 release? - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3

jdn5126 commented 6 months ago

Closing as v1.0.8 has been released. Please reopen if your issue is not resolved.

aws / aws-network-policy-agent

Is the network policy for VPC CNI designed to be stateful or stateless? #175