containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.14k stars 775 forks source link

Calico CNI failed with hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) #1004

Open wizpresso-steve-cy-fan opened 6 months ago

wizpresso-steve-cy-fan commented 6 months ago

Cross-issue from https://github.com/projectcalico/calico/issues/8465

Expected Behavior

Calico should work on Windows 11 with containerd

Current Behavior

It does not work on Windows 11

Possible Solution

Steps to Reproduce (for bugs)

  1. Install a k0s master
  2. Install a k0s worker on Windows 10/11 (not Windows Server)
  3. Install Calico with VXLAN mode only
  4. Wait until both Linux and Windows sides are healthy
  5. Create a Windows Container pod

Context

Here's the error

time="2024-01-31 14:52:57" level=info msg="I0131 14:52:57.786876    8516 kuberuntime_manager.go:436] \"Retrieved pods from runtime\" all=true" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="E0131 14:52:57.936439    8516 remote_runtime.go:193] \"RunPodSandbox from runtime service failed\" err=\"rpc error: code = Unknown desc = failed to setup network for sandbox \\\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\\": plugin type=\\\"calico\\\" name=\\\"Calico\\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) {\\\"Success\\\":false,\\\"Error\\\":\\\"所提供的原則設定無效或缺少參數。 \\\",\\\"ErrorCode\\\":2151350285}\"" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="E0131 14:52:57.936439    8516 kuberuntime_sandbox.go:72] \"Failed to create sandbox for pod\" err=\"rpc error: code = Unknown desc = failed to setup network for sandbox \\\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\\": plugin type=\\\"calico\\\" name=\\\"Calico\\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) {\\\"Success\\\":false,\\\"Error\\\":\\\"所提供的原則設定無效或缺少參數。 \\\",\\\"ErrorCode\\\":2151350285}\" pod=\"default/win-webserver-5cf6f5dd6f-g6xdp\"" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="E0131 14:52:57.936962    8516 kuberuntime_manager.go:1171] \"CreatePodSandbox for pod failed\" err=\"rpc error: code = Unknown desc = failed to setup network for sandbox \\\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\\": plugin type=\\\"calico\\\" name=\\\"Calico\\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) {\\\"Success\\\":false,\\\"Error\\\":\\\" 所提供的原則設定無效或缺少參數。 \\\",\\\"ErrorCode\\\":2151350285}\" pod=\"default/win-webserver-5cf6f5dd6f-g6xdp\"" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="I0131 14:52:57.936962    8516 kubelet.go:1697] \"SyncPod exit\" pod=\"default/win-webserver-5cf6f5dd6f-g6xdp\" podUID=\"2edbcee8-c2e1-48f6-a8f4-2102fa956c03\" isTerminal=false" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="E0131 14:52:57.936962    8516 pod_workers.go:1300] \"Error syncing pod, skipping\" err=\"failed to \\\"CreatePodSandbox\\\" for \\\"win-webserver-5cf6f5dd6f-g6xdp_default(2edbcee8-c2e1-48f6-a8f4-2102fa956c03)\\\" with CreatePodSandboxError: \\\"Failed to create sandbox for pod \\\\\\\"win-webserver-5cf6f5dd6f-g6xdp_default(2edbcee8-c2e1-48f6-a8f4-2102fa956c03)\\\\\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\\\\\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\\\\\\": plugin type=\\\\\\\"calico\\\\\\\" name=\\\\\\\"Calico\\\\\\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) {\\\\\\\"Success\\\\\\\":false,\\\\\\\"Error\\\\\\\":\\\\\\\"所提供的原則設定無效或缺少參數。 \\\\\\\",\\\\\\\"ErrorCode\\\\\\\":2151350285}\\\"\" pod=\"default/win-webserver-5cf6f5dd6f-g6xdp\" podUID=\"2edbcee8-c2e1-48f6-a8f4-2102fa956c03\"" component=kubelet.exe stream=stderr
time="2024-01-31 14:52:57" level=info msg="I0131 14:52:57.936962    8516 event.go:307] \"Event occurred\" object=\"default/win-webserver-5cf6f5dd6f-g6xdp\" fieldPath=\"\" kind=\"Pod\" apiVersion=\"v1\" type=\"Warning\" reason=\"FailedCreatePodSandBox\" message=\"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \\\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\\": plugin type=\\\"calico\\\" name=\\\"Calico\\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d) {\\\"Success\\\":false,\\\"Error\\\":\\\"所提供的原則設定無效或缺少參數。 \\\",\\\"ErrorCode\\\":2151350285}\"" component=kubelet.exe stream=stderr

The key error here is:

rpc error: code = Unknown desc = failed to setup network for sandbox \\"9f207b881af6caa158dd2b7251fc1f47b165de0d7395cfda370d66981189a279\\": plugin type=\\"calico\\" name=\\"Calico\\" failed (add): failed to create the new HostComputeEndpoint: hcnCreateEndpoint failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d)

Calico CNI config:

{
  "name": "Calico",
  "windows_use_single_network": true,

  "cniVersion": "0.3.1",
  "type": "calico",
  "mode": "vxlan",

  "vxlan_mac_prefix":  "0E-2A",
  "vxlan_vni": 4096,

  "policy": {
    "type": "k8s"
  },

  "log_level": "info",

  "windows_loopback_DSR": true,

  "capabilities": {"dns": true},

  "DNS":  {
    "Nameservers":  ["10.96.0.10"],
    "Search":  [
      "svc.cluster.local"
    ]
  },

  "nodename_file": "C:\\CalicoWindows\\libs\\calico\\..\\..\\nodename",

  "datastore_type": "kubernetes",

  "etcd_endpoints": "",
  "etcd_key_file": "",
  "etcd_cert_file": "",
  "etcd_ca_cert_file": "",

  "kubernetes": {
    "kubeconfig": "C:\\CalicoWindows\\calico-kube-config"
  },

  "ipam": {
    "type": "calico-ipam",
    "subnet": "usePodCidr"
  },

  "policies":  [
    {
      "Name":  "EndpointPolicy",
      "Value":  {
        "Type":  "OutBoundNAT",
        "ExceptionList":  [
          "10.96.0.0/12"
        ]
      }
    },
    {
      "Name":  "EndpointPolicy",
      "Value":  {
        "Type":  "SDNROUTE",
        "DestinationPrefix":  "10.96.0.0/12",
        "NeedEncap":  true
      }
    }
  ]
}

Also there are some error in Felix on Windows node:

2024-01-31 16:51:17.728 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.63
2024-01-31 16:51:21.240 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.0
2024-01-31 16:51:21.240 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.1
2024-01-31 16:51:21.240 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.2
2024-01-31 16:51:21.240 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.63
2024-01-31 16:51:32.720 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.0
2024-01-31 16:51:32.720 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.1
2024-01-31 16:51:32.720 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.2
2024-01-31 16:51:32.720 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.63
2024-01-31 16:51:36.207 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.0
2024-01-31 16:51:36.207 [WARNING][30888] felix/l3_route_resolver.go 662: Unable to create route for IP; the node it belongs to was not recorded in IPAM IP=10.244.178.1

I cannot ping from Windows to Linux and from Linux to Windows using the pod IPs either.

Your Environment

wizpresso-steve-cy-fan commented 6 months ago

https://github.com/containernetworking/plugins/blob/b6a0e0bc96906f0d3bd6bfcaab0b5ae72292f46c/plugins/main/windows/win-overlay/win-overlay_windows.go#L121-L139 This is likely where the policies are coming from.

wizpresso-steve-cy-fan commented 6 months ago

According to https://github.com/projectcalico/calico/issues/8465#issuecomment-1918686973 This is likely where the bug happens: https://github.com/containernetworking/plugins/blob/b6a0e0bc96906f0d3bd6bfcaab0b5ae72292f46c/plugins/main/windows/win-overlay/win-overlay_windows.go#L130-L132

wizpresso-steve-cy-fan commented 6 months ago

@thxCode would you like to take a look into this