Open orgh0 opened 5 months ago
Great to hear you find the project helpful. To serve multiple users I would suggest to look into batching, something that is on the roadmap but currently not supported. After that, you probably want some kind of router/load balancer to forward users to the correct endpoint.
@zoq - thanks for the quick response, really appreciate it.
Quick Questions:
Hi, thanks for the awesome work on this project !!
What might be the best way to get this project to work for scale? I have seen the docker images released, is deploying with kubernetes a sustainable solution?
We only need the smallest model, but GPU inferencing is not an option for us.
Any support would be super helpful.