aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.86k stars 967 forks source link

Multi region support in same cluster #5024

Open dcarrion87 opened 1 year ago

dcarrion87 commented 1 year ago

Description

What problem are you trying to solve?

We are running custom Karpenter implementation with k3s

We would like to extend to have one Karpenter handling multi region support in a single Kubernetes cluster. I can see subnetSelector assumes the current region.

Would we need to deploy multiple Karpenters to handle this scenario?

It is an option to have two separate clusters but for the use case we're OK with geo extended.

jonathan-innis commented 1 year ago

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.

Is there a way that you overcome that latency?

dcarrion87 commented 1 year ago

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region.

Is there a way that you overcome that latency?

@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.

Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?

We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.

jonathan-innis commented 1 year ago

Or will they trip over each other

Yeah, they'd definitely trip over each other. There's potentially a possibility that sometime deep in the future we might support the ability to run two at one time in the same cluster with some kind of global lock/lease hand-off mechanism, but right now, they are running with the assumption that they are a singleton in the cluster.

dcarrion87 commented 1 year ago

Thank you, @jonathan-innis. Appreciate the discussion and engagement.

jonathan-innis commented 1 year ago

Appreciate the discussion and engagement

No problem 👍 Definitely think that this could be a neat feature down-the-line, whether we support multi-region Karpenter natively or whether we allow multiple Karpenters to run in the same cluster

montanaflynn commented 5 months ago

Jumping in here with the same request. Our use case involves running large batch inference jobs where AWS can run out of GPUs in a single region. We want to have our single cluster be able to provision nodes from multiple regions.

FRABUCHI commented 2 months ago

We don't currently have multi-region support. The biggest reason for this is the expected communication latency between the worker nodes and the control-plane, which I'm assuming has to be a single region, even in k3s due to etcd leader being tied to a single region. Is there a way that you overcome that latency?

@jonathan-innis For our use case it's not an issue. 140ms is totally fine for where these worker nodes are going to be doing. I've done worker node tests without karpenter and it's fine.

Is it possible to run multiple Karpenters in the same cluster with AWS_REGION set differently and monitoring different provisioners / node templates? Or will they trip over each other?

We're likely going to pursue another route anyway but I just want to make sure I've exhausted everything on this front.

@dcarrion87 Could you please provide information on how to configure a multi-region node group in an EKS cluster?