Best practices to run multiple inference servers on same machine?

NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go

BSD 3-Clause "New" or "Revised" License

421 stars 94 forks source link

Best practices to run multiple inference servers on same machine? #9

Closed kraigrs closed 7 years ago

kraigrs commented 7 years ago

What is the best way to run multiple inference servers (of different models) on the same machine? I tried using the -p syntax mentioned here but received an error.

$ nvidia-docker run --name=server --net=host --rm inference_server -p 8000:34448
container_linux.go:247: starting container process caused "exec: \"-p\": executable file not found in $PATH"
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"-p\": executable file not found in $PATH".

Is it possible to do this? Any advice would be greatly appreciated.

flx42 commented 7 years ago

You need to put the -p before the image name (inference_server). But anyway this won't work here because we are using --net=host (no network isolation). You must either remove --net=host, or modify the code to add capability to listen on different ports (e.g. -l :7000 for instance).

kraigrs commented 7 years ago

Great, thank you! I ended up solving this in the following manner:

$ nvidia-docker run --name=server1 --net=host --rm inference_server1
$ nvidia-docker run --name=server2 -p 8888:8000 --rm inference_server2

You can then access those different inference servers via the following:

$ curl -XPOST --data-binary @images/1.jpg http://127.0.0.1:8000/api/classify
$ curl -XPOST --data-binary @images/1.jpg http://127.0.0.1:8888/api/classify

While it may not qualify as a "best practice", it gets the job done for now, as I was unsure where to modify the code to add capability to listen on different ports.