Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
956 stars 197 forks source link

创建pod时出现UnexpectedAdmissionError #596

Open jfboy233 opened 2 weeks ago

jfboy233 commented 2 weeks ago

Please provide an in-depth description of the question you have: 创建一个pod时出现UnexpectedAdmissionError,具体信息是Allocate failed due to requested number of devices unavailable for nvidia.com/gpu. Requested: 1, Available: 0, which is unexpected。但当我在yaml中删除nvidia.com/gpu:1这一行时,能够成功调度,但31993/metrics端口无法检测到该pod What do you think about this question?: 请问排查问题该从哪里入手 Environment:

image

Nimbus318 commented 2 weeks ago
  1. GPU 节点的 Annotation 需要提供一下
  2. 这个节点上是否还有其他的 Running 的 GPU pod
  3. hami 的版本可能有点老,可以尝试更新到 2.4.0,期间修复了部分问题