When load testing an MLServer (deployed on AWS EKS with SC-V2) with this setup I get the following error whenever the size of the batches in my load tests exceeds ~2/3:
mlserver 2023-07-24 10:28:25,275 [mlserver.parallel] ERROR - Response processing loop crashed. Restarting the loop...
mlserver Traceback (most recent call last):
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 55, in _process_responses_cb
mlserver process_responses.result()
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 76, in _process_responses
mlserver await self._process_response(response)
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 81, in _process_response
mlserver async_response = self._async_responses[internal_id]
mlserver KeyError: '93821e47-8589-48d2-a1c1-79a145b5ccf2'
mlserver 2023-07-24 10:28:25,276 [mlserver.parallel] DEBUG - Starting response processing loop...
For context, I'm using a server configured as follows:
The payloads are not particularly large (float32 - [1, 192, 256]) and I can see from monitoring the pods' memory consumption that they are well within the resource limits specified. I've also tried setting MLSERVER_PARALLEL_WORKERS to 0 which does solve this issue, but only by virtue of disabling parallel workers.
When load testing an MLServer (deployed on AWS EKS with SC-V2) with this setup I get the following error whenever the size of the batches in my load tests exceeds ~2/3:
For context, I'm using a server configured as follows:
And I do have adaptive batching enabled:
The payloads are not particularly large (float32 -
[1, 192, 256]
) and I can see from monitoring the pods' memory consumption that they are well within the resource limits specified. I've also tried settingMLSERVER_PARALLEL_WORKERS
to0
which does solve this issue, but only by virtue of disabling parallel workers.