Open ooooona opened 1 year ago
Hey @ooooona ,
Depending on your benchmark settings, there could be not enough traffic to run on the other workers in parallel. Generally, MLServer will do a round-robin across workers. This is the case for MLServer > 1.2.0 (which version of MLServer are you using?).
However, if there aren't enough concurrent requests (e.g. when using a single client to send requests, or requests are processed too fast), each worker will complete processing each request before the next one comes in - effectively looking like only one of them is working.
hi @adriangonz ,
mlserver, version 1.3.0.dev3
. Actullay I build the image from git with commit 'eaa056371befccf74c66efc62192ffdd3c4a254e'.top
showed the same: only one process has CPU usage ~100%, while other ~0%, and the latency increased to 390ms<99p>(1 concurrency 12ms<99p>, 10 concurrency 45ms<99p>).I also tried the same testing way on seldon-core-microservice, its multiprocess really worked, not only at CPU usage, but also improving throughput(reduce latency). So I think there might be something wrong with mlserver.
Hey @ooooona ,
Thanks for providing those details.
Could you share more info on the type of requests you are sending? How large are these?
Deserialisation happens on the main process - so if these are large requests, that could be a potential bottleneck.
hi @adriangonz , sorry for my late reply. my message, my request is quite small:
$ cat sklearn-mlserver.json
{
"inputs": [
{
"name": "args",
"shape": [1,4],
"datatype": "FP32",
"data": [10.1,13.5,1.4,0.2]
}
]
}
Hi, I'm trying to improve throughput of my server which was running by MLServer, and I got to know that I can set 'parallel_workers > 1' to enable parallel. Hence I set it in settings.json as below:
Then I use ab(apche benchmark) to test my server, at the same time, I use top to monitor the usage of CPU and MEMORY. I can see that, the server really fork 10 process. However, only 1 process really worked while the others did nothing???
the result of ab test showed that 'parallel_workers=1' has the same latency with 'parallel_workers=10': 'parallel_workers=10'
'parallel_workers=1'