Open Kuntal-G opened 4 years ago
@Kuntal-G
How does mxnet model server handle multiple concurrent/parallel request for a particular model?
MMS queues requests coming in for a particular model and serves them. Current MMS uses Netty for HTTP request/response handling. I am not sure what you were looking for in this question.
When we load a model inside multi-model-server, does it apply forking/multiprocessing to host multiple copies of the same model to improve the throughput/latency?
Yes. if you have preload_model
set to true
, it uses fork semantics to create newer instances of model workers. This works on Unix based systems.
What is the default value and is there any config to decide how many copies of the model will be spawned?
The source code is the document itself currently. If you have specific questions about code, we could probably answer them :) .
As per mxnet inference doc, the main dispatcher thread is single threaded. https://cwiki.apache.org/confluence/display/MXNET/Parallel+Inference+in+MXNet
How does mxnet model server handle multiple concurrent/parallel request for a particular model? When we load a model inside multi-model-server, does it apply forking/multiprocessing to host multiple copies of the same model to improve the throughput/latency? If yes, What is the default value and is there any config to decide how many copies of the model will be spawned?
Also, what does the worker thread actually perform? https://github.com/awslabs/multi-model-server/blob/master/mms/model_service_worker.py#L166-L212
Any guide and pointer to code will be highly appreciated.