NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.67k stars 607 forks source link

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. #430

Open Todoroki02 opened 1 year ago

Todoroki02 commented 1 year ago
root@ttogpu:~# kubectl describe pod triton-inference-server-5b6c7f889c-f54c6 
Name:             triton-inference-server-5b6c7f889c-f54c6
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=triton-inference-server
                  pod-template-hash=5b6c7f889c
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/triton-inference-server-5b6c7f889c
Containers:
  triton-server:
    Image:       triton_server:latest
    Ports:       8000/TCP, 8001/TCP, 8002/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Limits:
      nvidia.com/gpu:  1
    Requests:
      nvidia.com/gpu:  1
    Environment:
      DP_DISABLE_HEALTHCHECKS:  xids
    Mounts:
      /models from model-repository (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sczwq (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  model-repository:
    Type:          HostPath (bare host directory volume)
    Path:          /path/to/host/model/directory
    HostPathType:
  kube-api-access-sczwq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                             nvidia.com/gpu:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m58s  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

Need help for this error.

github-actions[bot] commented 6 months ago

This issue has become stale and will be closed automatically within 30 days if no activity is recorded.