Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
812 stars 170 forks source link

请问是否支持异构设备共享 #514

Open wJunjie-1995 opened 3 weeks ago

wJunjie-1995 commented 3 weeks ago

是否支持异构设备共享,即

  1. 单个任务使用相同厂商但不同芯片的显卡进行训练/推理(如同时使用1张V100和1张H100进行训练)?
  2. 或者单个任务使用不同厂商的卡进行训练/推理(如使用1张V100+1张910B进行训练)?
Nimbus318 commented 3 weeks ago

单个任务使用相同厂商但不同芯片的显卡进行训练/推理(如同时使用1张V100和1张H100进行训练)?

If both NVIDIA GPUs are on the same node, then it's supported.

单个任务使用不同厂商的卡进行训练/推理(如使用1张V100+1张910B进行训练)?

Not supported.

Usually, frameworks like TensorFlow and PyTorch, along with their related Python libraries, are designed to work in homogeneous environments where at least the hardware vendors are the same to function properly. I'm a bit confused about the second scenario.