gardener / gardener-extension-provider-vsphere

Gardener extension controller for the vSphere cloud provider (https://www.vmware.com).
https://gardener.cloud
Other
8 stars 46 forks source link

Cilium as CNI not working in vSphere environments #293

Open christianhuening opened 1 year ago

christianhuening commented 1 year ago

How to categorize this issue?

/area networking /kind bug /platform vsphere

What happened: We installed a shooted Seed into vSphere using Calico as CNI solution. Then we created a Shoot based on that Seed using Cilium as CNI. The cluster got created and came up with one node and everything worked. When adding more nodes, specifically the vsphere-csi-driver pods stopped working because they couldn't communicate with the API server via its internal domain name (timeout).

An interesting observation was that in vSphere the Cilium nodes apparently got two IPs, one from the node CIDR and one from the POD CIDR, which obviously is wrong.

With Calico the above works just fine.

Calico (node cidr: 10.0.0.0/16)

seed

Cilium (node cidr: 10.10.0.0/16 & pod cidr: 10.80.0.0/12)

shoot

What you expected to happen: pods can communicate to api server properly and nodes dont get pod cidr IPs ;-)

How to reproduce it (as minimally and precisely as possible):

  1. deploy shooted seed into vsphere with calico as cni
  2. deploy shoot onto that seed in the same vsphere with cilium as cni
  3. Environment:

marwinski commented 1 year ago

/assign ScheererJ

Hi Johannes, can you please check. @christianhuening can you please provide cluster details to Johannes.

christianhuening commented 1 year ago

@marwinski what details would you need beyond what's already stated above?

marwinski commented 1 year ago

the cluster

ScheererJ commented 1 year ago

@christianhuening Can you please try to switch to geneve as tunnel protocol as see if that works for you? vxlan as overlay seems to be broken in some configurations. At least, it was in the vsphere cluster I created.

spec:
...
  networking:
    type: cilium
    providerConfig:
      apiVersion: cilium.networking.extensions.gardener.cloud/v1alpha1
      kind: NetworkConfig
      tunnel: geneve
...
briantopping commented 1 year ago

@christianhuening Does @ScheererJ's comment resolve this issue?

christianhuening commented 1 year ago

@briantopping unfortunately no, since the customer decided to switch over to Tanzu and hence we stopped using gardener+vsphere here.