Open kerthcet opened 3 months ago
We use lws as underlying workload to support multi-host inference, however, we only support one pod one model right now. The general idea is once model flavor requires like nvidia.com/gpu: 32, we'll split into 4 hosts each requires 8GPU.
nvidia.com/gpu: 32
/kind feature /milestone v0.2.0
We use lws as underlying workload to support multi-host inference, however, we only support one pod one model right now. The general idea is once model flavor requires like
nvidia.com/gpu: 32
, we'll split into 4 hosts each requires 8GPU.