Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
979 stars 200 forks source link

CUDA_DEVICE_SM_LIMIT = 0 的疑惑 #636

Open for800000 opened 5 days ago

for800000 commented 5 days ago

Please provide an in-depth description of the question you have: 请教下CUDA_DEVICE_SM_LIMIT = 0 ,这种情况libvgpu属于是拦截了还是没有拦截呢,相比nvidia-device-plugin有没有损耗呢 What do you think about this question?:

Environment:

lixd commented 5 days ago

CUDA_DEVICE_SM_LIMIT 设置为 0 会被当做 100 处理,也会走libvgpu,但是不会做算力限制了,理论上这样也会有损耗。可以配置环境变量 CUDA_DISABLE_CONTROL=true 来屏蔽掉容器层的资源隔离机制。

for800000 commented 4 days ago

感谢,还有个问题

image image

同个deploy的两个副本,分配1张卡,任务运行起来后出现mem分配与使用不一致,这是正常的还是bug,v2.4.0