Closed iankouls-aws closed 7 months ago
IIUC, this falls under the scope of custom resources, which should be handled with https://github.com/aws/karpenter/pull/2390. @jonathan-innis can you confirm this?
We could also look to add early support for Dynamic Resource Allocation. See https://kubernetes.io/blog/2022/12/15/dynamic-resource-allocation/
EFAs are an interesting bit of (virtual) hardware because the OS-bypass networking can only happen within the same subnet. That then could mean that the scheduler needs to be aware of that limitation in order to place Pods appropriately.
There are some other considerations, such as what security group to use for the EFAs. I could imagine a large cluster having more than one kind of EFA, perhaps each different kind is associated with a different security group.
EFAs are an interesting bit of (virtual) hardware because the OS-bypass networking can only happen within the same subnet. That then could mean that the scheduler needs to be aware of that limitation in order to place Pods appropriately.
There are some other considerations, such as what security group to use for the EFAs. I could imagine a large cluster having more than one kind of EFA, perhaps each different kind is associated with a different security group.
As a first step, it might be okay to leave the single subnet setup up to the user. They could configure a single provisioner with EFA support. Placement groups would also need to be setup, so it may be convenient to set both of those up at the Provisioner level and then target that provisioner. Security groups would also be setup at the provisioner level as normal.
Until this feature is implemented, a temporary workaround is documented here:
https://github.com/aws-samples/aws-do-eks/tree/main/Container-Root/eks/deployment/karpenter#how-to-use-kaprenter-with-efa
This is by no means a design for EFA support in Karpenter. It uses a custom launch template and mounts the EFA device in the pod, which requires privileged mode.
The goal of this feature should be for pods to be able to specify vpc.amazonaws.com/efa: 1
resources and Karpenter to understand the request and add ec2 instances that are appropriate for the pods AND have EFA enabled.
This feature is really important for making distributed training on EKS viable and is currently forming a bottleneck for us training large models. Its especially hard getting the launch template working well in combination with mounting nvme disks.
Problem Statement:
Some workloads [distributed training, simulations, HPC applications] require high performance networking on AWS provided by instances that are enabled with Elastic Fabric Adapter. The specific instance types are documented here. Currently Karpenter does not recognize EFA resource requests or limits specified in Kubernetes manifests as described below.
When such a manifest is applied to a Karpenter-enabled cluster, the Karpenter controller produces an error like the following:
Feature Request:
Add capability in Karpenter to recognize resource
vpc.amazonaws.com/efa
, identify, and provision a suitable EC2 instance type with EFA enabled.