Azure / azure-container-networking

Azure Container Networking Solutions for Linux and Windows Containers
MIT License
378 stars 241 forks source link

[NPM] Network Policy does not Block Pods accessing Worker Node Ports in which Pods are scheduled #579

Closed dhananjaya94 closed 2 years ago

dhananjaya94 commented 4 years ago

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

Which release version?: v1.1.2

Which component (CNI/IPAM/CNM/CNS): NPM

Which Operating System (Linux/Windows):

v1.16.8   Ubuntu 16.04.6 LTS   4.15.0-1083-azure   docker://3.0.10+azure

Which Orchestrator and version (e.g. Kubernetes, Docker) Kubernetes AKS

What happened:

What you expected to happen:


Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-egress-network-policy
  namespace: test-netpol
spec:
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        # Private IP CIDRs
        - 10.0.0.0/8
        - 192.168.0.0/16
        - 172.16.0.0/12
       # Azure VM meta data URL
        - 169.254.169.254/32
  - ports:
    - port: 53
      protocol: UDP
    to:
    - namespaceSelector:
        matchLabels:
          addonmanager.kubernetes.io/mode: Reconcile
      podSelector:
        matchLabels:
          k8s-app: kube-dns
  podSelector: {}
  policyTypes:
  - Egress

Debug Tool used. https://github.com/Mirage20/k8s-debug-tools

❯ kubectl run debug-tools --image=mirage20/k8s-debug-tools --restart=Never -n test-netpol
❯ kubectl get nodes -o wide
NAME                                STATUS   ROLES   AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-defaultnp-49312751-vmss000006   Ready    agent   16d    v1.16.8   172.20.0.6    <none>        Ubuntu 16.04.6 LTS   4.15.0-1083-azure   docker://3.0.10+azure
aks-defaultnp-49312751-vmss00000e   Ready    agent   2d3h   v1.16.8   172.20.0.67   <none>        Ubuntu 16.04.6 LTS   4.15.0-1083-azure   docker://3.0.10+azure
❯ kubectl get po -o wide -n test-netpol
NAME          READY   STATUS    RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
debug-tools   1/1     Running   0          65s   172.20.0.78   aks-defaultnp-49312751-vmss00000e   <none>           <none>
❯ kubectl exec -it debug-tools -n test-netpol -- bash
root@debug-tools:/# telnet 172.20.0.6 22
Trying 172.20.0.6...

^C
root@debug-tools:/# telnet 172.20.0.67 22
Trying 172.20.0.67...
Connected to 172.20.0.67.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.8
^C^C
Connection closed by foreign host.
root@debug-tools:/#

Anything else we need to know:

Type (plugin) Advanced  : (Azure CNI)
Service CIDR                    : 172.21.0.0/16
DNS service IP                 : 172.21.0.10
Docker bridge CIDR         : 172.22.0.1/16
Network policy                 : Azure      
AKS Subnet CIDR
172.20.0.0/18

jaer-tsun commented 4 years ago

Hi, would you be able to verify if this is still the case in the latest release? We've moved the ACCEPT entry for RELATED,ESTABLISHED traffic under all of the other NPM chains. My guess is that the docker container to host traffic is considered RELATED.

dhananjaya94 commented 4 years ago

@jaer-tsun , this issue is in NPM 1.1.2 release.

    image: mcr.microsoft.com/containernetworking/azure-npm:v1.1.2
    imageID: docker-pullable://mcr.microsoft.com/containernetworking/azure-npm@sha256:8487911471ab7abd9116bb407d78b05721791eb48169e75f99201badc430c0db
jaer-tsun commented 4 years ago

Yup, but I we've released v1.1.4 recently with changes that may resolve this issue.

dhananjaya94 commented 4 years ago

I have raised a support ticket to Azure, asking to upgrade NPM version in AKS.

dhananjaya94 commented 4 years ago

@jaer-tsun , roll out of v1.1.4 to AKS globally would be done in first week of July according to Azure Support. Will try to reproduce this once the release is available.

dhananjaya94 commented 4 years ago

Azure AKS NPM 1.1.4 was rolled out to one of our AKS clusters finally.

  containerStatuses:
  - containerID: docker://7ee80dc20afb229ef8110eb818eb19c43e31803d6c0c37797d91393591b012b8
    image: mcr.microsoft.com/containernetworking/azure-npm:v1.1.4
    imageID: docker-pullable://mcr.microsoft.com/containernetworking/azure-npm@sha256:d1ef2bebbb62bf9f97c7d51d6d799c673d638b78ac4eae51c71268b1bbab9209

But the issue still persists,

❯ kubectl get nodes -o wide
NAME                                STATUS   ROLES   AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-defaultnp-32130788-vmss00001c   Ready    agent   5d22h   v1.16.8   172.16.0.6     <none>        Ubuntu 16.04.6 LTS   4.15.0-1089-azure   docker://3.0.10+azure
aks-defaultnp-32130788-vmss00001d   Ready    agent   5d22h   v1.16.8   172.16.0.128   <none>        Ubuntu 16.04.6 LTS   4.15.0-1083-azure   docker://3.0.10+azure
aks-defaultnp-32130788-vmss00001e   Ready    agent   3d18h   v1.16.8   172.16.0.65    <none>        Ubuntu 16.04.6 LTS   4.15.0-1083-azure   docker://3.0.10+azure
❯ kubectl run debug-tools --image=mirage20/k8s-debug-tools --restart=Never
pod/debug-tools created
❯ kubectl get po -o wide
NAME          READY   STATUS    RESTARTS   AGE    IP             NODE                                NOMINATED NODE   READINESS GATES
debug-tools   1/1     Running   0          3m4s   172.16.0.143   aks-defaultnp-32130788-vmss00001d   <none>           <none>
❯ kubectl exec -it debug-tools -- bash
root@debug-tools:/#
root@debug-tools:/# telnet 172.16.0.128 22 # node pod is scheduled
Trying 172.16.0.128...
Connected to 172.16.0.128.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.8
^C
Connection closed by foreign host.
root@debug-tools:/# telnet 172.16.0.65 22 # node pod is not scheduled
Trying 172.16.0.65...
jaer-tsun commented 4 years ago

It doesn't look like the pod is in the same namespace as network policy

dhananjaya94 commented 4 years ago

@jaer-tsun I can confirm the pod is in the same namespace as the netpol is applied.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

Issue closed due to inactivity.