InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes
Apache License 2.0
13 stars 5 forks source link

Support different GPU accelerators for fungibility #62

Open kerthcet opened 1 month ago

kerthcet commented 1 month ago

What would you like to be added:

Models can be loaded with different accelerators, for example, llama2-70b can be located with 2 A100 80GB or 4 A100 40 GB, we should support this. And usually high-end GPUs will be stockout frequently, this can help improve the SLO of services.

Why is this needed:

Cost saving and SLO consideration.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

kerthcet commented 1 month ago

/kind feature /milestone v0.1.0