4paradigm / k8s-vgpu-scheduler

OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
Apache License 2.0
489 stars 93 forks source link

Vgpu的限制问题 #28

Open changge27 opened 1 year ago

changge27 commented 1 year ago

6月前更新的libvgpu.so。可以工作,在pytorch上工作正常,超出显存大小会正常报错。但是在tensorflow上不正常,显存限制不正常,可以超出切分的大小而不报错。

archlitchi commented 1 year ago

这个建议使用vgpu-scheduler,目前vgpu-device-plugin只支持到cuda11.3,更新版本的cuda任务就会出现你说的那个问题,如果存在使用困难的话,欢迎加我wx:xuanzong4493