Support multi-host inference

InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes

Apache License 2.0

25 stars 10 forks source link

Support multi-host inference #16

Open kerthcet opened 3 months ago

kerthcet commented 3 months ago

We use lws as underlying workload to support multi-host inference, however, we only support one pod one model right now. The general idea is once model flavor requires like nvidia.com/gpu: 32, we'll split into 4 hosts each requires 8GPU.

kerthcet commented 3 months ago

/kind feature /milestone v0.2.0