aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.2k stars 856 forks source link

Enhanced Support for Network Interface Configuration in Karpenter AWS Node Templates #4267

Closed vara-bonthu closed 1 year ago

vara-bonthu commented 1 year ago

Description

What problem are you trying to solve?

I am unable to configure network interfaces using Karpenter AWS Node Templates, specifically when provisioning instances with more than 2 network cards. This limitation poses a significant challenge for running ML workloads that require advanced network configurations. Instances like p4d.24xlarge, p4de.24xlarge, trn1.32xlarge, and trn1n.32xlarge, which are commonly used for ML workloads, have more than 2 network cards available.

For ML workloads, it is crucial to have fine-grained control over network interfaces to optimize performance and achieve desired networking setups. Instances with multiple network cards provide opportunities for improved throughput, reduced latency, and advanced networking scenarios.

Currently, Karpenter AWS Node Templates do not natively support network interface configuration beyond the default two interfaces. This limitation prevents users from fully leveraging the capabilities of instances with more than 2 network cards and restricts the ability to achieve optimal networking configurations for ML workloads.

While it is possible to configure network interfaces using launch templates or userdata, these approaches introduce additional complexity and manual steps. Having native support for network interface configuration within Karpenter AWS Node Templates would greatly simplify the deployment process and ensure a streamlined experience for users.

Instance type Number of network cards
p4d.24xlarge 4
p4de.24xlarge 4
r6idn.32xlarge 2
r6idn.metal 2
r6in.32xlarge 2
r6in.metal 2
trn1.32xlarge 8
trn1n.32xlarge 16

How important is this feature to you?

This feature is critically important to me as it directly impacts the functionality and performance of my ML workloads. Being able to configure network interfaces seamlessly using Karpenter AWS Node Templates would enhance the flexibility and efficiency of my infrastructure setup. It would allow me to fully leverage the capabilities of instances with multiple network cards, optimize network throughput, reduce latency, and achieve the desired networking configurations for my ML workloads.

Enabling network interface configuration in Karpenter AWS Node Templates would significantly enhance the experience of running ML workloads and provide a more integrated and efficient solution for managing the network interfaces required by instances with more than 2 network cards.

Current workaround The following snippet is a workaround used in Data on EKS blueprints for building trn1.32xlarge instances

---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: trainium
  namespace: karpenter
spec:
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  subnetSelector:
    Name: "${eks_cluster_id}-private*"        # Name of the Subnets to spin up the nodes
  securityGroupSelector:                      # required, when not using launchTemplate
    Name: "${eks_cluster_id}-node*"           # name of the SecurityGroup to be used with Nodes
  #  instanceProfile: ""      # optional, if already set in controller args
  #RAID0 config example
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash
    echo "Running a custom user data script"
    # Configure NVMe volumes in RAID0 configuration
    # https://github.com/awslabs/amazon-eks-ami/blob/056e31f8c7477e893424abce468cb32bbcd1f079/files/bootstrap.sh#L35C121-L35C126
    # Mount will be: /mnt/k8s-disks
    export LOCAL_DISKS='raid0'

    # Create 8 network interfaces similar to Terraform configuration
    INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

    # Create network interface with device_index = 0
    INTERFACE_ID=$(aws ec2 create-network-interface \
      --description "NetworkInterfaces Configuration For EFA and EKS" \
      --delete-on-termination \
      --query 'NetworkInterface.NetworkInterfaceId' \
      --output text
    )
    aws ec2 attach-network-interface \
      --network-interface-id $INTERFACE_ID \
      --instance-id $INSTANCE_ID \
      --device-index 0 \
      --network-card-index 0

    # Create network interfaces with device_index = 1 and network_card_index from 1 to 7
    for ((network_card_index=1; network_card_index<8; network_card_index++))
    do
      INTERFACE_ID=$(aws ec2 create-network-interface \
        --description "NetworkInterfaces Configuration For EFA and EKS" \
        --delete-on-termination \
        --query 'NetworkInterface.NetworkInterfaceId' \
        --output text
      )
      aws ec2 attach-network-interface \
        --network-interface-id $INTERFACE_ID \
        --instance-id $INSTANCE_ID \
        --device-index 1 \
        --network-card-index $network_card_index
    done

    # EFA Setup for Trainium and Inferentia
    export FI_EFA_USE_DEVICE_RDMA=1
    export FI_PROVIDER=efa
    export FI_EFA_FORK_SAFE=1

    curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz
    tar -xf aws-efa-installer-latest.tar.gz && cd aws-efa-installer
    ./efa_installer.sh -y -g
    /opt/amazon/efa/bin/fi_info -p efa

    --BOUNDARY--

  tags:
    InstanceType: "trainium" 
jonathan-innis commented 1 year ago

This looks similar to #3819. Have you had a chance to look over that design doc?

jonathan-innis commented 1 year ago

Closing as a duplicate of #2026