gardener / machine-controller-manager

Declarative way of managing machines for Kubernetes cluster
Apache License 2.0
246 stars 113 forks source link

Support for IaaS machine tags for all machines in a worker pool #750

Open vlerenc opened 1 year ago

vlerenc commented 1 year ago

What would you like to be added: Gardener adopters like to provide IaaS machine tags for all machines in a worker pool, much like labels, annotations, and taints are provided in the shoot spec per worker pool. All machines in that pool shall be created with those tags (ideally atomically, i.e. machine+tags together, so that there is no intermediate state with machines with no/missing tags), e.g.:

    workers:
    - name: pool-1
      ...
      vmtags:
        owner: john.doe@foo.com
        team: bar
        criticality: production
        personal-data: false
      labels:
        key: value
      annotations:
        key: value

Why is this needed: For asset management various Gardener adopters rely on machine tags to automatically classify machines. Since Gardener/MCM creates those, adopters have no way today to provide those with Gardener so that Gardner/MCM creates them for all machines in a worker pool uniformly.

himanshu-kun commented 1 year ago

IIUC, they would also require that they can change the tags after the machines are created? Also what's the priority level for this? any deadline?

@unmarshall @rishabh-11 this is exactly the use-case which we were discussing yesterday while discussing project-wide ssh keys

vlerenc commented 1 year ago

Good question, but I guess on create is sufficient and simpler to build (with an admission plugin to prevent changing the tags in the shoot spec)? It would be a lot more effort to "revisit" machines again and change them, right (we do not do that today/yet)?

himanshu-kun commented 1 year ago

currently we have tags support available in machineclass aws example(its an old AWSMachineClass, but providerSpec looks like this for AWS)

We use these tags currently to set info like workerPoolName,machineObjName etc.

The way MCM is integrated with Gardener, if we allow editing these tags by customers, then machineClass name would change , hence reference in machineDeployment and that would trigger a rolling update.

While out of Gardener context , a change in these tags in machineClass, doesn't trigger a reconcile of the machine.

So yes it'll be a lot more effort to revisit the machines on tags update, and its better if we don't allow updating in the first place Kindly ignore and refer to the latest comment

vlerenc commented 1 year ago

OK, then let's focus on immutable tags. That we then initiate a rolling update is the right thing. 👍

himanshu-kun commented 1 year ago

We discussed the design of how we transport tags to VMs. We think its a sub-optimal design and ideally there shouldn't be a rolling update on VM tags update for the machine. It would be ideal to keep only those configurations in machineClass whose change really require a rolling update.

Proposed solution:

Then we can have a mechanism of syncingVMTags just like we have it for syncing nodeMachineAnnotation/lables/taints.

cc @elankath

himanshu-kun commented 1 year ago

I went through the code again, and this is what we follow currently : We pass the labels which user provides in shootYaml as tags to the VM. For doing that we update the machineClass. This update is done by gardener-extension-provider-XYZ. Currently aws provider directly passes the labels to the tag section of machineClass, while azure and GCP sanitize them, and then pass. There is no update in machineClass name on addition/removal of tags via labels But there is one general problem: we don't update the tags once machine is created. This is because we refer to the tags in machineClass only during machineCreation.

unmarshall commented 1 year ago

Yes, i tested this as well. I added labels to the shoot yaml and it was correctly reflected in the machine class. Name of the machine class contains a name which comprises of the deployment name and a worker pool Hash. This does not include labels so any change made to labels will not change the hash and will thus not cause a rolling update.

However the machine objects did not get the updated tags. If we take AWS then they provide API to update tags (Golang API). But i did not find any reference/usage of this API today in our code.

kon-angelo commented 1 year ago

We don't really want to cause a rolling update of all VMs for a label change so we don't trigger a hash change when labels are added. It still has the effect that the node objects are updated with the new labels AFAIK but not the machines.

Ideally we could have a hook in the MCM lifecycle that updates the VMs with as many attributes as it can update (ignoring those it can't update)

himanshu-kun commented 1 year ago

We had a meeting , and we decided the following approach:

vlerenc commented 1 year ago

Uh, that sounds a bit "ugly", doesn't it?

I mean: Dear user, you can/have to place tags under labels (not tags), but with a "magic" prefix, that is then cut off later before it materialises as tag and btw, the label you just defined, won't materialise as label at the node, despite being added to the label section.

We won't win a beauty contest this way. What was the motivation for this proposal?

himanshu-kun commented 1 year ago

Yes it sounds ugly but we considered the following benefits:

If we use the prefix method proposed above we could act accordingly in mcm-providers and decide which tag to send where(node , vmtag, vmNetworkTag etc) , plus addition of any new kind of tag would involve changes just on MCM level.

unmarshall commented 1 year ago

Only learnt recently that the the labels in the worker pool are meant to be applied on Node resources. (https://gardener.cloud/docs/gardener/api-reference/core/#core.gardener.cloud/v1beta1.Worker)

I agree that labels is now no longer clear if we include tags that should only be applied to VMs and should not be added to Node as labels. I really wish the labels field was called nodeLabels to be make crystal clear that it is meant for only nodes. vmTags or networkTags are meant to be applied to resources provisioned by a provider and therefore metadata associated to these resources should not be mixed with resources that are then created and managed via K8s controllers. IMO, these are provider specific metadata which should be interpreted by provider specific extensions. Is it not possible to include this in providerConfig? I would also recommend to arrange the tags in a way which allows for future extensions.

tags:
  - vm:
    key1: value1
    key2: value2
  - network:
     - key3
     - key4
unmarshall commented 1 year ago

If we go with the approach to keep them as labels, then i would recommend that we properly choose a prefix which can then be extended to support other kinds of tags in the future.

Proposal: provider.gardener.cloud/tag/vm/<key-name>: <value> At some later point in time if we have the need to also support network tags then we can then just have them as: provider.gardener.cloud/tag/network/<key-name>: <value> or provider.gardener.cloud/tag/nw/<key-name>: <value>

Also to keep things consistent then all labels should also be visible on the Node/Machine objects as well and only labels which are namespaced/prefixed with provider.gardener.cloud/tag/vm should be sanitized as per the provider requirements and then applied to the respective virtual machines using provider specific APIs.

In addition, somewhere these prefixes should be clearly documented so that consumers are aware of the correct way to set these tags.

Drawbacks of using the label approach for VM tags:

  1. Using labels the usability is not so great as compared to having a dedicated section for providing different tags as the consumers need to know of all allowed prefixes and then not make a mistake in providing this prefix.
  2. If the user makes a mistake in setting these tags (prefix is mis-spelled) then MCM + extensions will never be able to validate this tag and report an error. User will think that the tags have been accepted and it will also be visible on the Node/Machine object as well but never on the VM.
himanshu-kun commented 9 months ago

needs UpdateMachine() driver call for its implementation https://github.com/gardener/machine-controller-manager/issues/767