Closed dingusagar closed 1 year ago
any update on this? because of the above concerns, we had to temporarily put the transition to inferentia inference server on hold.
Hello @dingusagar,
We are in the process of drafting some better documentation to help assist in situations like this. The simple answer to your questions is that the Neuron SDK is designed to sit at a lower abstraction layer, underneath multi-model inference frameworks.
Since you mention docker containers - a common technique is to use nginx or haproxy as a load balancer and routing layer in front of your model containers. Usually this is done in conjunction with one of the docker orchestration frameworks such as compose, swarm, or kubernetes. For non-docker or inter-docker situations, there are tools such as the https://github.com/awslabs/multi-model-server that can be used.
Closing
This is my current setup and scenario.
Questions / Doubts :