Right now, llmaz is mostly designed for large language models, however, some users may need to support traditional models as a singleton solution, let's wait for some feedbacks.
The solution is quite similar, we have to implement the server runtime just like vllm for different kinds of models, or reuse the official ones like torchserve.
Why is this needed:
Completion requirements:
This enhancement requires the following artifacts:
[x] Design doc
[x] API change
[x] Docs update
The artifacts should be linked in subsequent comments.
What would you like to be added:
Right now, llmaz is mostly designed for large language models, however, some users may need to support traditional models as a singleton solution, let's wait for some feedbacks.
References:
The solution is quite similar, we have to implement the server runtime just like vllm for different kinds of models, or reuse the official ones like torchserve.
Why is this needed:
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.