I am experiencing error with batching enabled, 2022-11-08T13:07:08+0000 [INFO] [api_server:1] 127.0.0.1:33304 (scheme=http,method=POST,path=/v1/get_intents,type=application/json,length=91) (status=200,type=application/json,length=20) 1450.984ms (trace=f8472c38b374f57a7213989491a40acc,span=c32005903caa5b5f,sampled=0)
Traceback (most recent call last):
File "/workspace/personality_framework/personality_service/bento_service.py", line 238, in get_intent
result=await runner1.is_positive.async_run([{"sentence":query}])
File "/tmp/e2/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 53, in async_run
return await self.runner._runner_handle.async_run_method( # type: ignore
File "/tmp/e2/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 207, in async_run_method
raise ServiceUnavailable(body.decode()) from None
without batching the same code works well ,
my batching configuration is
enabled: true
max_batch_size: 100
max_latency_ms: 1000
without batching with load testing i get reply to my 100 simultaneous requests without error , with batching im facing the above error
Discussed in https://github.com/bentoml/BentoML/discussions/3137
I am experiencing error with batching enabled, 2022-11-08T13:07:08+0000 [INFO] [api_server:1] 127.0.0.1:33304 (scheme=http,method=POST,path=/v1/get_intents,type=application/json,length=91) (status=200,type=application/json,length=20) 1450.984ms (trace=f8472c38b374f57a7213989491a40acc,span=c32005903caa5b5f,sampled=0) Traceback (most recent call last): File "/workspace/personality_framework/personality_service/bento_service.py", line 238, in get_intent result=await runner1.is_positive.async_run([{"sentence":query}]) File "/tmp/e2/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 53, in async_run return await self.runner._runner_handle.async_run_method( # type: ignore File "/tmp/e2/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 207, in async_run_method raise ServiceUnavailable(body.decode()) from None
without batching the same code works well ,
my batching configuration is enabled: true max_batch_size: 100 max_latency_ms: 1000
without batching with load testing i get reply to my 100 simultaneous requests without error , with batching im facing the above error
Can you try on the latest bentoml version, with the new service API?