aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.62k stars 923 forks source link

Dynamic instance tagging (from node labels) #4228

Open cebernardi opened 1 year ago

cebernardi commented 1 year ago

Description

What problem are you trying to solve?

We have a Provisioner that is configured to launch instances both in arm64 and amd64. We would like to dynamically tag the ec2 instances with the architecture.

Karpenter tags the instances with a default set of "dynamic" tags, described at the beginning of this documentation section: https://karpenter.sh/docs/concepts/node-templates/#spectags

Name: karpenter.sh/provisioner-name/<provisioner-name>
karpenter.sh/provisioner-name: <provisioner-name>
kubernetes.io/cluster/<cluster-name>: owned

All the rest of tagging options, are static (here and here).

We would like to have a way to specify dynamic tags, for example:

spec:
  tags:
    InternalAccountingTag: 1234
    dev.corp.net/app: Calculator
  tagsFromLabels:
  - kubernetes.io/arch
  - node.kubernetes.io/instance-type

it could be also something like:

spec:
  tags:
    InternalAccountingTag: 1234
    dev.corp.net/app: Calculator
    arch: "{kubernetes.io/arch}"
    instance-type: "{node.kubernetes.io/instance-type}"

while InternalAccountingTag and dev.corp.net/app are "static" tags, kubernetes.io/arch and node.kubernetes.io/instance-type are known labels, that could be used as tags as well.

How important is this feature to you? It would enable closer monitoring of costs and migrations, which are blocked now, or possible with workarounds.

cebernardi commented 1 year ago

I would be interested to contribute to this feature, should it be accepted.

njtran commented 1 year ago

@cebernardi if you're open to contributing, please reach out to us in the #karpenter-dev channel!

In your head, this would land in the AWSNodeTemplate right?

@jonathan-innis is actively working on graduating our APIs to v1beta1, and this seems reasonable to implement, but probably would be good to talk through some of the cases and how this fits together.

cebernardi commented 1 year ago

Sure, will do! Thank you!

jonathan-innis commented 1 year ago

@cebernardi So, there are a few technical complications with tagging from labels, particularly in the way that you are proposing up above that I want to raise that make me think that we may need something that does tagging based on labels after the fact as a separate controller.

Here are the issues at a high-level:

  1. We can't know some of the tagging details of the instance until after launch (e.g. instance type) since we simply send over a bunch of instance types, images, architectures, etc. over to CreateFleet and have the fleet service pick the best instance type at a given time. So anything that has the potential to be dynamically picked as part of the launch request can't be sent in the tag specifications
  2. We currently batch our calls to CreateFleet to reduce the number of calls that we make out to them. This is a smaller issue but if we create enough dynamicness in the tagging of an instance such that every call to create a machine results in a CreateFleet call, users may hit throttling more than they'd like on the Fleet API. This is why, in general, at this point we have only tagged the instance with things that can be determined to be consistent across a provisioner.

A natural solution to this might be to look up instances that are owned by Karpenter after launch and then tag them post-launch, so that you get what you want. This would obviously result in more API calls but I think this could get you there. The challenge here is whether that functionality is in-scope for the Karpenter project. I could potentially see a world where we create a tagging controller where you can define the labels on the node that you care about, and this component is responsible for tagging/untagging the instance when labels change on the node. Thoughts?

cebernardi commented 1 year ago

yep, I've seen from the code that some of the labels are going to be unknown until this point, and they're synced back to the kubernetes node as labels shortly after.

so, my initial thought was to create an additional step to tag the instances after they're launched, but using a different api, not CreateFleet, but something like aws tagging apis, or ec2 apis.

I wasn't convinced myself by the flow

launch instance --> sync nodes labels from instance type and requirements --> tag back the instance from node labels

I could potentially see a world where we create a tagging controller where you can define the labels on the node that you care about, and this component is responsible for tagging/untagging the instance when labels change on the node

I have a few questions:

jonathan-innis commented 1 year ago

I'm wondering if there's room here to expand out the existing CCM capability: https://github.com/kubernetes/cloud-provider-aws/blob/master/docs/tagging_controller.md

jonathan-innis commented 1 year ago

you mean a totally separated project

Seems like this might fall under the bucket of a separate project. You could imagine a world where I generalize this to the extent that I say "here are the labels that I care about, tag my instance with these labels using this template". The controller looks up the labels and then periodically tags the instance.

cebernardi commented 1 year ago

Seems like this might fall under the bucket of a separate project

which I guess it implies that the solution to this problem might not belong to Karpener, and therefore we're going to close this issue? :)

cebernardi commented 1 year ago

if you see https://github.com/kubernetes/cloud-provider-aws as a better fit, should I open a issue there?

github-actions[bot] commented 10 months ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

montanaflynn commented 3 months ago

We're looking for this same functionality. Our use case is for cost analysis, letting us get insights into the costs of different batch processing jobs. We have a NodePool that defines several instance types and both spot and on-demand instances. Letting us apply a tag like eks-job-name would give us the ability to break down the actual costs of the job's instances after it's ran using AWS Cost Explorer.