gardener / gardener-extension-provider-gcp

Gardener extension controller for the GCP cloud provider (https://cloud.google.com).
https://gardener.cloud
Apache License 2.0
13 stars 83 forks source link

Consider dropping `zone` field in `controlPlaneConfig` #541

Open vlerenc opened 1 year ago

vlerenc commented 1 year ago

What would you like to be added: While testing HA, with new HA clusters, I was wondering which purpose the zone field serves and whether we can drop it, because it gives the impression of a dominant zone, which "looks" like a contradiction to HA:

 provider:
    controlPlaneConfig:
      apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
      zone: europe-north1-a <-- ?

Why is this needed: To reach full HA.

vlerenc commented 1 year ago

According to @rfranzke: Back then, CCM required this setting in its config: https://github.com/gardener/gardener-extension-provider-gcp/blob/05a7bed06ecf428ea[…]rnal/cloud-provider-config/templates/cloud-provider-config.yaml. Probably they have evolved the CCM to remove such necessities completely in case it runs on a cloud provider different from GCP.

ialidzhikov commented 1 year ago

/assign @dimitar-kostadinov

dimitar-kostadinov commented 1 year ago

The zone field in controlPlaneConfig is used in cloud-provider-config and csi-driver-controller-config configmaps.

The cloud-provider-config configmap is mount in cloud-controller-manager deployment, gcp-cloud-controller-manager container and the location is set with the --cloud-config param.

The csi-driver-controller-config configmap is mount in csi-driver-controller deployment, gcp-csi-driver container and the location is set with the --cloud-config param.

The initialization and usage of localZone in gcp-cloud-controller-manager is represented with the following flow:

  1. github.com/gardener/cloud-provider-gcp/cmd/gcp-cloud-controller-manager/main.go#L81
  2. github.com/kubernetes/cloud-provider/plugins.go#L159
  3. github.com/kubernetes/cloud-provider/plugins.go#L85
  4. github.com/kubernetes/cloud-provider/cloud.go#L43
  5. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L98
  6. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L281
  7. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L205
  8. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L288
  9. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L335 #if not set uses metadata service to get the project and zone
  10. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L347
  11. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L355
  12. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L366 #multizone=true -> all zones in region can be used, i.e. cloudConfig.ManagedZones = nil
  13. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L222
  14. github.com/kubernetes/legacy-cloud-providers/gce/gce.go#L518-L520
  15. github.com/kubernetes/legacy-cloud-providers/gce/gce_zones.go#L42 #but GetZone(ctx context.Context) (Zone, error) is not used

In conclusion, the localZone is mainly used to get the region and the region is widely used when dealing with resources. An issue for adding region option is opened.

The initialization and usage of zone in gcp-csi-driver is represented with the following flow:

  1. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/cmd/gce-pd-csi-driver/main.go#L38
  2. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/cmd/gce-pd-csi-driver/main.go#L125
  3. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce.go#L73
  4. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce.go#L152
  5. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce.go#L219 #if not set uses metadata service to get the project and zone
  6. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce.go#L102
  7. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce.go#L50
  8. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce-compute.go#L118 #gets the region
  9. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/pkg/gce-cloud-provider/compute/gce-compute.go#L165 #gets the region
  10. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce-compute.go#L112 #GetDefaultZone()
  11. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-cloud-provider/compute/gce-compute.go#L75
  12. github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pkg/gce-pd-csi-driver/controller.go#L1441 #no zones have been specified in either topology or params, picking default zone, but the Topology feature gate of external provisioner is true and driver supports VOLUME_ACCESSIBILITY_CONSTRAINTS capability, so it always specify topology.gke.io/zone in the request.

This note states when the projectId and the zone should be specified.

In conclusion, the zone is mainly used to get the region.

  1. One option is to remove the controlPlaneConfig.zone and to use the first zone from the first worker pool.
  2. Do nothing and live with the confusing field.

@vlerenc, @rfranzke WDYT?