Azure / azure-container-networking

Azure Container Networking Solutions for Linux and Windows Containers
MIT License
376 stars 241 forks source link

azure-npm randomly corrupts ip-tables. Results in connection failures between pods #858

Closed ludydoo closed 3 years ago

ludydoo commented 3 years ago

What happened:

We have 3 AKS clusters, 3 nodes each. Mix of B16 and D16 nodes.

We use NetworkPolicies to restrict traffic between pods. We define pretty strict rules (per-pod & namespace).

azure-npm randomly causes the pod connections to fail, as it seems the IP table rules are getting corrupted.

Sometimes, when I inspect the clusters in the morning, I have multiple pods failing due to azure-npm corrupting the iptables, without having changed or touched anything overnight.

For example, we use argocd to manage our deployments. argocd-server will randomly fail to contact the argocd-repo-server.

Killing the azure-npm pods solves the problem. But this is not a viable solution.

I sometimes see this error in azure-npm

2021/04/23 07:06:04 [1] Error: There was an error running command: [ipset -X -exist azure-npm-784554818] Stderr: [exit status 1, ipset v7.5: Set cannot be destroyed: it is in use by a kernel component[]

What you expected to happen:

azure-npm to correctly define IP-table rules. We would expect AKS to have a bug-free CNI, as this is such a critical component of the infrastructure!

I tried upgrading azure-npm to 1.3.0, but it seems that AKS automatically manages this, and will downgrade to 1.1.8

How to reproduce it:

Very hard to say. It sometimes happens when the labels on the pods/namespaces change. But also happen randomly. Help in debugging this would be greatly appreciated.

Orchestrator and Version (e.g. Kubernetes, Docker):

AKS 1.20.2 azure-npm mcr.microsoft.com/containernetworking/azure-npm:v1.1.8`

Operating System (Linux/Windows):

Linux

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

5.4.0-1043-azure #45~18.04.1-Ubuntu SMP Sat Mar 20 16:16:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Anything else we need to know?: [Miscellaneous information that will assist in solving the issue.]

ludydoo commented 3 years ago

https://github.com/Azure/AKS/issues/1810

neaggarwMS commented 3 years ago

@ludydoo We are in the process of rolling out 1.3.0 (currently rolled out in EastUS2Euap and CentralUSEuap). It will reach all production by ~05/07. We've made significant improvements for reliability and handing scale/perf with NPM some of the details are below and beefed up our investments in this space to make it first class, this has come a long way

Improvements:

Bugs fixed

Original behavior logic:

(Ingress rule 1 OR ingress rule 2 OR ingress rule 3 OR egress rule 1 OR Egress rule 2)

Current Behavior:

(Ingress rule 1 OR ingress rule 2 OR ingress rule 3) AND (egress rule 1 OR Egress rule 2)

We've also added Prometheus and azure monitor support for latency , iptables and ipset rules monitoring as well: https://docs.microsoft.com/en-us/azure/virtual-network/kubernetes-network-policies#monitor-and-visualize-network-configurations-with-azure-npm

Release Notes: https://github.com/Azure/azure-container-networking/releases

Can you share more about your setup details like Region, Subscription, AKS cluster details and Network policies to help you assist better?

neaggarwMS commented 3 years ago

@ludydoo I would also like to add we are also compliant with Ginkgo E2E conformance test-suite which is maintained by sig-network. We are currently in the process of also integrating with their cyclonus test framework (https://kubernetes.io/blog/2021/04/20/defining-networkpolicy-conformance-cni-providers/#cyclonus).

ludydoo commented 3 years ago

Hi @neaggarwMS

Thanks for the answer.

Our setup is as follows:

3 x AKS 1.20.2 clusters, westeurope region

When do you estimate GA of 1.3.0 for Westeurope region ? Also, is there a way we can fastrack this?

We have multiple networkpolicies.

I'll give a few examples:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argo-server
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: argo-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
    # Allow ingress from VPN-IngressGateway
    - ports:
        - port: web
          protocol: TCP
      from:
        - namespaceSelector:
            matchLabels:
              name: istio-system
          podSelector:
            matchExpressions:
              - key: istio
                operator: In
                values:
                  - vpn-ingressgateway
    # Allow ingress from Prometheus (metrics)
    - ports:
        - port: http-envoy-prom
          protocol: TCP
        - port: 15020
          protocol: TCP
      from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
          podSelector:
            matchExpressions:
              - key: prometheus
                operator: In
                values:
                  - prometheus
  egress:
    - ports:
        - port: 15012
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              name: istio-system
          podSelector:
            matchExpressions:
              - key: istio
                operator: In
                values: [ pilot ]
    - ports:
        - protocol: UDP
          port: 53
      to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: minio
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: minio
  ingress:
    - ports:
        - port: 9000
          protocol: TCP
      from:
        # Allow Argo Server Ingress
        - podSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [ argo-server ]
          namespaceSelector:
            matchLabels:
              name: argo
        # Allow Ingress from Argo Workflows
        - podSelector:
            matchExpressions:
              - key: workflows.argoproj.io/workflow
                operator: Exists
          namespaceSelector:
            matchExpressions:
              - key: name
                operator: In
                values: [ argo ]
    - ports:
        - port: 15020
          protocol: TCP
      from:
        # Allow Prometheus Metrics Scraping
        - podSelector:
            matchExpressions:
              - key: prometheus
                operator: In
                values: [ "prometheus" ]
          namespaceSelector:
            matchLabels:
              name: monitoring
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: workflow-controller
spec:
  podSelector:
    matchLabels:
      app: workflow-controller
  policyTypes:
  - Egress
  - Ingress
  ingress:
    # Allow Prometheus Metrics Scraping
    - ports:
        - port: 15020
          protocol: TCP
        - port: 9090
          protocol: TCP
      from:
        - podSelector:
            matchExpressions:
              - key: prometheus
                operator: In
                values: [ "prometheus" ]
          namespaceSelector:
            matchLabels:
              name: monitoring
  egress:
    - ports:
        - port: 15012
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              name: istio-system
          podSelector:
            matchExpressions:
              - key: istio
                operator: In
                values: [ pilot ]
    - ports:
        - protocol: UDP
          port: 53
      to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
    - ports:
        - protocol: TCP
          port: 8081
      to:
        - namespaceSelector:
            matchLabels:
              name: argo
          podSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [ repo-server ]
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argo-server-egress-to-kubernetes-api
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: argo-server
  policyTypes:
  - Egress
  egress:
    - to:
        - ipBlock:
            cidr: <redacted_master_api_ip>/32
      ports:
        - protocol: TCP
          port: 443
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argo-workflow-controller-egress-to-kubernetes-api
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: workflow-controller
  policyTypes:
  - Egress
  egress:
    - to:
        - ipBlock:
            cidr: <redacted_master_api_ip>/32
      ports:
        - protocol: TCP
          port: 443
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: minio-ingress-from-argo-workflow
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: minio
  policyTypes:
  - Ingress
  ingress:
    - ports:
        - protocol: TCP
          port: 9000
      from:
        - podSelector:
            matchExpressions:
              - key: "workflows.argoproj.io/workflow"
                operator: Exists
          namespaceSelector:
            matchExpressions:
              - key: name
                operator: In
                values: [ cicd ]
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argo-server-egress-to-kubernetes-api
  namespace: argo
spec:
  podSelector:
    matchLabels:
      app: argo-server
  policyTypes:
  - Egress
  egress:
    - to:
        - ipBlock:
            cidr: <redacted_master_api_ip>/32
      ports:
        - protocol: TCP
          port: 443
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: external-dns
  namespace: external-dns
spec:
  podSelector:
    matchExpressions:
      - key: app
        operator: In
        values:
          - external-dns
          - external-dns-private
  policyTypes:
  - Ingress
  - Egress
  ingress:
    - ports:
        - port: 15020
          protocol: TCP
      from:
        - podSelector:
            matchExpressions:
              - key: prometheus
                operator: In
                values: [ prometheus ]
          namespaceSelector:
            matchLabels:
              name: monitoring
  egress:
    - ports:
        - port: 15012
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              name: istio-system
          podSelector:
            matchExpressions:
              - key: istio
                operator: In
                values: [ pilot ]
    - ports:
        - protocol: UDP
          port: 53
      to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
    - ports:
        - port: 443
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              name: istio-system
          podSelector:
            matchExpressions:
              - key: istio
                operator: In
                values: [ egressgateway-sni-proxy ]
vakalapa commented 3 years ago

@ludydoo Thank you for sharing the extensive list of NetworkPolicies. v1.3.1 is now being rolled out as we speak. Through safe deployment processes, we release the version to only a subset of regions on a given day. WestEurope should get this release by EOD Monday 3rd May. Once the cluster upgrades, please test if these issues are still persistent, if yes, you can either ping us here or create a support ticket with Azure Support. As Neha mentioned, with 1.3.1, we have added some reliability improvements which should remove flakiness with iptable rules and ipset lists.

tiholm commented 3 years ago

@vakalapa I think we faced this same issue today, after 1 year in production with AKS & CNI & Network policies, pods were not able to connect each other. Kubernetes 1.17.9, North Europe. Removing network policies and rebooting azure-npm & coreDNS pods resolved, but is this ramdom issue still on-going is this version and region?

neaggarwMS commented 3 years ago

@tiholm, can you describe one of the Azure NPM pod and share its version? Also can you share the cluster details (Region, Subscription, Resource Group Name and Cluster Name)?

tiholm commented 3 years ago

@neaggarwMS I have opened an support issue with id 2105040050002654 and discuss the private things there.

neaggarwMS commented 3 years ago

Sounds good, thanks @tiholm. We will look into that and get back on the case. We can close this github issue.

BenjaminHerbert commented 3 years ago

I believe we were experiencing this issue also in two clusters. One of the namespaces using egress network policies could not connect to services anymore. We could only "fix" it by removing all network policies.

There was a restart of the azure-npm pods around the time the errors first happened in both clusters.

Support request ID | 121050425001058

vakalapa commented 3 years ago

@BenjaminHerbert a new release of NPM v1.3.2 is deployed and the restart of NPM might be because of that. With this version, we are changing the behavior to be inline with upstreams rules evaluation. You can find details below: https://github.com/Azure/azure-container-networking/wiki/TSG:-NPM--v1.3.0-breaking-changes

If the egress is allowing the traffic and traffic gets dropped, there is a high probability that ingress of the destination is not allowing this traffic. Can you please evaluate all ingress rules being applied on the destination pod.

BenjaminHerbert commented 3 years ago

Thanks for the information.

We have a NetworkPolicy that should allow all egress traffic which used to work fine.

> kubectl get netpol egress-to-any -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  labels:
    name: egress-to-any
    manager: kubectl
    operation: Update
    time: "2021-05-07T11:55:47Z"
  name: egress-to-any
  namespace: abnahme-lowcode
spec:
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    - ipBlock:
        cidr: ::/0
  podSelector: {​​​​​​​}​​​​​​​
  policyTypes:
  - Egress

Since a few days, we are having problems opening connections.

vakalapa commented 3 years ago

NPM currently does not ipv6 addressing as it relies on ipv4 ipsets and iptables for rules. That might have not let this policy get applied on to the node, resulting in some traffic to be blocked. Please try to remove the ipv6 cidr block and apply the policy again. If that did not solve the issue, can you ask the support engineer working on your support ticket to escalate the issue to vakr@microsoft.com and we can internally debug more.

BenjaminHerbert commented 3 years ago

Thanks for the input. I checked it without the ipv6 cidr block and it still does not work. I asked to have our case assigned/to you. Thanks for your help!

vakalapa commented 3 years ago

@BenjaminHerbert thank you for the debugging session, as discussed, you are hitting this #870 known issue.

vakalapa commented 3 years ago

@ludydoo Are you still facing issue ? If so can you raise a support case and request to be escalated to vakr@microsoft.com We can help resolve. until that i will be closing this issue.

ludydoo commented 3 years ago

It seems that the issue is not present anymore