Download models in prior

InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes

Apache License 2.0

13 stars 5 forks source link

Closed kerthcet closed 3 weeks ago

kerthcet commented 3 weeks ago

What would you like to be added:

Support to downloads models in prior of inference service, and it's useful when new nodes privisioned.

Why is this needed:

Accelerate the service startup, however, if the cluster has P2P accelerating system like dragonfly, this seems not that necessary.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

kerthcet commented 3 weeks ago

/kind question

kerthcet commented 3 weeks ago

This maybe useful for really really big models because some nodes are dedicated for them, like with high end GPUs preset.

kerthcet commented 3 weeks ago

However, this can be solved with cache systems like fluid, llmaz should not touch this because caching to the local host makes little sense. /close