SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
659 stars 179 forks source link

Disabling adaptive batching leads to slower batched requests #1826

Open jegork opened 2 weeks ago

jegork commented 2 weeks ago

Hi!

I am sending 100 data samples in a single request (1 request with data containing 100 examples)

When I set adaptive batching to these values:

max_batch_time = 0.25
max_batch_size = 8

then request gets processed in 3.5 seconds, but if do not set these two parameters (i.e. disable adaptive batching), then the same request takes 8 seconds. This looks strange, as in my opinion adaptive batching should not have impact when making a single request.