gardener / machine-controller-manager-provider-gcp

Gardener machine controller manager provider for GCP
Apache License 2.0
3 stars 27 forks source link

Add `zone` network tag to GCP VMs #71

Closed himanshu-kun closed 1 year ago

himanshu-kun commented 1 year ago

What would you like to be added:

Add network tag to the VMs created on GCP which helps identify the zone to which it belongs.

Why is this needed: @vlerenc was trying to simulate zone outage on a GCP cluster , and without these tags already present on the VMs , it was quite painful to achieve this.

I was trying to simulate zone outages on GCP, but that turns out to be somewhat “complicated” by the overly smart and stateful firewalls in GCP. Basically, we do not set any labels, tags, and especially no network tags that designate the zone a node is in (and there is only one cross-zonal subnet). This forces me to set new network tags if I want to define zone-blocking firewall rules. That’s already kind of bad/non-atomic (with then newly coming up nodes), but what is worse is that existing network connections are not affected (https://cloud.google.com/vpc/docs/firewalls#effects_on_existing_traffic), which I could see (blocked nodes that I could’t kubectl exec to anymore still reported their status to the API servers over their pre-existing connections and only if I restarted the API servers, the nodes/pods status where no longer updated). That makes the process of blocking zones in GCP pretty shitty:

  • High complexity on even achieving some basic form of blocking
  • Not atomic as I must set the network tags after the fact unless we set the zone automatically ; the new tagging feature in MCM won’t help here either, because the worker pools are usually defined cross-zonal, so only Gardener/MCM can set different tags for the nodes in different zones being part of the same worker pool)
  • Requires restart of the machine to cut off all existing connections
vlerenc commented 1 year ago

Unless you want to add the zone to the machines also for other reasons, there is no need to in my context. https://github.com/gardener/chaos-engineering supports now all major cloud providers. No need anymore.

Also, it wouldn't have been GCP only, but about all cloud providers. I had similar issues also with the others later, so I built a filter feature.

You can close this ticket if it's about my use case. If you don't, then I suggest to expand it to all machine-controller-manager-providers, not just GCP.