aws-samples / aws-efa-eks

Deploying EFA in EKS utilizing GPUDirectRDMA where supported
MIT No Attribution
35 stars 19 forks source link

Add remaining EFA instance types #18

Closed jmdeal closed 9 months ago

jmdeal commented 9 months ago

Issue #, if available:

Description of changes: This PR updates the device plugin daemonset to reflect all EC2 instances which support EFA. The instances were determined using the following query:

aws ec2 describe-instance-types --region $region --filters Name=network-info.efa-supported,Values=true  --query "InstanceTypes[*].[InstanceType]"  --output text

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jmdeal commented 9 months ago

Closing because I realize this could result in a poor user experience if a user has instances in their cluster that can support EFA but don't have EFAs attached. In this case the device plugin crashes leaving the pod in a crash backoff loop. Its probably good to keep the default list small and to instances that are mostly used with EFAs, though a different mechanism might be a better default (e.g. label based node selection).