SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
726 stars 183 forks source link

Adaptive batching batch timeout fires sooner than configured #1936

Open hanlaur opened 1 month ago

hanlaur commented 1 month ago

Hello,

In case of adaptive batching, if I define MLSERVER_MODEL_MAX_BATCH_TIME=1, expecting 1 second batching, the timeout is reached much sooner than it should be. Here is example (with debug print added before adding a request to batch to print the calculated timeout). The debug prints show that the timestamps of requests vs. the remaining timeout does not match. First request happens at 07:27:26,260 but ML server thinks it has reached the 1 second timeout at 07:27:26,386 i.e. in approx. 130ms.

remaining timeout 1.0 currently accumulated 0
remaining timeout 0.9999668598175049 currently accumulated 1
remaining timeout 0.9999086856842041 currently accumulated 2
2024-10-25 07:27:26,260 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,261 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.9958555698394775 currently accumulated 3
remaining timeout 0.9917783737182617 currently accumulated 4
2024-10-25 07:27:26,263 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,264 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,266 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.9834423065185547 currently accumulated 5
remaining timeout 0.9750831127166748 currently accumulated 6
remaining timeout 0.9666938781738281 currently accumulated 7
2024-10-25 07:27:26,267 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,269 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,271 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.95306396484375 currently accumulated 8
remaining timeout 0.939399003982544 currently accumulated 9
remaining timeout 0.9257209300994873 currently accumulated 10
2024-10-25 07:27:26,272 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,274 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,276 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.9061038494110107 currently accumulated 11
remaining timeout 0.8864269256591797 currently accumulated 12
remaining timeout 0.8667359352111816 currently accumulated 13
2024-10-25 07:27:26,278 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,282 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.843052864074707 currently accumulated 14
remaining timeout 0.8193438053131104 currently accumulated 15
2024-10-25 07:27:26,282 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.7896456718444824 currently accumulated 16
remaining timeout 0.7599077224731445 currently accumulated 17
2024-10-25 07:27:26,374 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,375 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,375 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.6421425342559814 currently accumulated 18
remaining timeout 0.524355411529541 currently accumulated 19
2024-10-25 07:27:26,378 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,378 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.40399718284606934 currently accumulated 20
remaining timeout 0.2836179733276367 currently accumulated 21
2024-10-25 07:27:26,380 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,381 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout 0.1606147289276123 currently accumulated 22
remaining timeout 0.03759360313415527 currently accumulated 23
2024-10-25 07:27:26,383 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
2024-10-25 07:27:26,383 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer
remaining timeout -0.08817529678344727 currently accumulated 24
remaining timeout -0.21396827697753906 currently accumulated 25
2024-10-25 07:27:26,386 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer

Issue seems to be the timeout calculation here: https://github.com/SeldonIO/MLServer/blob/1ce29e65d5cd14a24a573f2f44ae2eac2b51a0f4/mlserver/batching/adaptive.py#L139

I believe the formula should be timeout = self._max_batch_time - (current - start).

sakoush commented 1 month ago

@hanlaur many thanks for raising the issue and identifying the underlying problem. This makes sense.

Feel free to provide a fix and we will be happy to review it soon.

hanlaur commented 1 month ago

Feel free to provide a fix and we will be happy to review it soon.

Thanks @sakoush for quick response! I was hoping to avoid going through the CLA steps this time, this being just one variable name change... So I created issue instead of PR. I am wondering if you would have possibility to make the PR for this?

sakoush commented 2 weeks ago

@hanlaur hopefully we look into this in the next couple of weeks, we will also need to add testing.