aws-samples / aws-efa-eks

Deploying EFA in EKS utilizing GPUDirectRDMA where supported
MIT No Attribution
37 stars 18 forks source link

Issue with container running as root #8

Closed vsoch closed 1 year ago

vsoch commented 1 year ago

Hiya! I found this repository because I'm creating an EKS cluster with eksctl, specifically like this:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: flux-cluster
  region: us-east-2
  version: "1.23"

availabilityZones: ["us-east-2b", "us-east-2c"]
managedNodeGroups:
  - name: workers
    instanceType: hpc6a.48xlarge
    minSize: 64
    maxSize: 64
    labels: { "fluxoperator": "true" }
    availabilityZones: ["us-east-2b"]
    efaEnabled: true
    placement:
      groupName: eks-efa-testing

And when I request a job asking for efa for my pods, e.g, (this is our operator CRD that has worked before):

# Resource limits to enable efa
resources:
    limits:
        vpc.amazonaws.com/efa: 1
        memory: "340G"
        cpu: 94

the pods are stuck in pending. Further inspection reveals:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  27s (x11 over 13m)  default-scheduler  0/64 nodes are available: 64 Insufficient vpc.amazonaws.com/efa.

And then I realized I could look at the logs of the pod that is supposed to provide the efa (which is where I found the container name / config that is provided in the manifest folder of this repo) and I saw:

$ kubectl describe pods -n kube-system aws-efa-k8s-device-plugin-daemonset-zpg2s
Name:                 aws-efa-k8s-device-plugin-daemonset-zpg2s
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      default
Node:                 ip-192-168-31-140.us-east-2.compute.internal/192.168.31.140
Start Time:           Mon, 30 Jan 2023 17:29:25 -0700
Labels:               controller-revision-hash=5cd48c4575
                      name=aws-efa-k8s-device-plugin
                      pod-template-generation=1
Annotations:          kubernetes.io/psp: eks.privileged
                      scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Pending
IP:                   192.168.31.140
IPs:
  IP:           192.168.31.140
Controlled By:  DaemonSet/aws-efa-k8s-device-plugin-daemonset
Containers:
  aws-efa-k8s-device-plugin:
    Container ID:   
    Image:          602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/kubelet/device-plugins from device-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m82qz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  device-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:  
  kube-api-access-m82qz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             aws.amazon.com/efa:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  66m                   default-scheduler  Successfully assigned kube-system/aws-efa-k8s-device-plugin-daemonset-zpg2s to ip-192-168-31-140.us-east-2.compute.internal
  Normal   Pulling    66m                   kubelet            Pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3"
  Normal   Pulled     66m                   kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" in 4.231578378s
  Warning  Failed     64m (x12 over 66m)    kubelet            Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)
  Normal   Pulled     115s (x303 over 66m)  kubelet            Container image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" already present on machine

Specifically notice the second to last line - there is an error about "runAsNonRoot"

  Warning  Failed     64m (x12 over 66m)    kubelet            Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)

I am thinking this might be related to eksctl, if it's creating / submitting this yaml, but since I found what appears to be the same efa container here, I thought I would ask! Is there perhaps a spot fix I could do, re-applying this yaml to ask to run as root? :thinking:

vsoch commented 1 year ago

okay I think I found the issue - they made a change to add a boolean (that isn't here) at the end of November last year (see linked issue above).

DanielJuravski commented 1 year ago

This helm chart resolved the issue https://github.com/aws-samples/efa-device-plugin-helm