Open jsuchome opened 1 year ago
I notice that you have a relatively old version of BentoML. We did have a fix for this issue since this version (IIRC).
Is there any specific reason why you are locked to this version?
Actually this is happening with 1.0.15 as well. When I was creating the issue, I switched to some old container. I'll update the report.
Can you provide your service definition here? You can strip out anything sensitive
bentofile.yaml:
service: "service:onnx_models" # Same as the argument passed to `bentoml serve`
labels:
owner: ds-team-shipamax
stage: dev
include:
- "service.py" # A pattern for matching which files to include in the bento
- "configuration.yml"
python:
packages: # Additional pip packages required by the service
- scipy==1.7.3
- pandas==1.3.5
- onnxruntime==1.13.1
- onnx
save_models_for_bento.py
(snippets:)
onnx_model = keras2onnx.convert_keras(model, model.name)
bentoml.onnx.save_model("logo_image_classifier", onnx_model)
Oh sorry, I mean your service.py
.
service.py:
def fix_arg(arg):
fixed_arg = arg
if isspmatrix_csr(arg):
fixed_arg = arg.toarray()
elif isinstance(arg, list):
fixed_arg = np.array(arg)
return fixed_arg
def json_to_ndarray(x: dict) -> np.ndarray:
if 'buffer' in x:
buffer = base64.b64decode(x['buffer'].encode('ascii'))
return np.frombuffer(buffer, x['dtype']).reshape(*x['shape'])
if 'pandas' in x:
return pd.read_json(x["pandas"])
return np.array(x['data'], dtype=x['dtype'])
def ndarray_to_json(x: np.ndarray, binary: bool = True) -> dict:
if binary:
return {
'buffer': base64.b64encode(x.tobytes()).decode('ascii'),
'dtype': x.dtype.str,
'shape': list(x.shape)
}
return {
'data': x.tolist(),
'dtype': x.dtype.str,
}
model_names = [
("logo_image_classifier", bentoml.onnx),
]
models = {m.split("/")[-1]: loader.get(f"{m.split('/')[-1]}:latest").to_runner() for m, loader in model_names}
onnx_models = bentoml.Service(
name="onnx_models",
runners=models.values()
)
def _invoke_runner(runner, name, input: str) -> str:
bentoml_logger.debug(f"INPUT: type {type(input)}")
input_npack = [json_to_ndarray(x) for x in input]
result = getattr(runner, name).run(*input_npack)
bentoml_logger.debug(f"Result: size: {len(result)} ({type(result)})")
return ndarray_to_json(fix_arg(result))
@onnx_models.api(
input=JSON(),
output=JSON(),
route="logo_image_classifier/predict"
)
def logo_image_classifier_predict(input):
return _invoke_runner(models["logo_image_classifier"], "run", input)
This service used to offer more models before, that's the reason for generating more runners, but we have narrowed the problem to onnx model
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
Are you using Yatai or just deploying plain BentoML?
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
Are you using Yatai or just deploying plain BentoML?
Plain BentoML
I wonder if circus is dying or failing to restart the runners. Do either of you have runner logs available? Maybe run with --debug
?
server log with --debug
@sauyon any update on this issue? seems like memory issue, adding some more memory on the container seems to resolve the issue, but the error description dosen't indicate memory issue and also might not be sustainable to keep adding memory.
Just bumping this issue as we are also experiencing it in production, it leads to silent restarts which are very difficult to detect. Is there a way to force the application to completely stop in case of problem instead?
I am even facing this issue mainly on transformer models, any one had any breakthrough ?
We suspect there's memory leak with aiohttp clients at some versions. @nicjac @nadimintikrish Could you help us with providing the scenario to reproduce them?
Hi @jianshen92 ! I hope this one I created earlier would help!
https://github.com/bentoml/BentoML/issues/4238
bentoml serve bento:xx works fine.
but containerizing and running the container kind of causes this issue.
bentoml serve
works as expected.
docker run --rm -p 3000:3000 <service:version>
--> Error happens once the server gets a request.
Describe the bug
I'm getting
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
with one of my models when I post 3 requests at the same time. Once the error happens, all other requests to the same service fail with the same problem.This is not related to any cloud infrastructure because I can reproduce it in local docker container.
Raising amount of memory (above 4GiB) seems to help, though the error message does not indicate any memory issues.
This is the error log:
Expected behavior
The service does not crash.
Environment
bentoml
: 1.0.15python
: 3.7.12platform
: Linux-5.15.0-1031-aws-x86_64-with-debian-11.2