bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
https://bentoml.com
Apache License 2.0
6.99k stars 778 forks source link

bug: aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected #3669

Open jsuchome opened 1 year ago

jsuchome commented 1 year ago

Describe the bug

I'm getting aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected with one of my models when I post 3 requests at the same time. Once the error happens, all other requests to the same service fail with the same problem.

This is not related to any cloud infrastructure because I can reproduce it in local docker container.

Raising amount of memory (above 4GiB) seems to help, though the error message does not indicate any memory issues.

This is the error log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 336, in api_func
    output = await run_in_threadpool(api.func, input_data)
  File "/usr/local/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.7/site-packages/anyio/to_thread.py", line 32, in run_sync
    func, *args, cancellable=cancellable, limiter=limiter
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/bentoml/bento/src/service.py", line 86, in logo_image_classifier_predict
    return _invoke_runner(models["logo_image_classifier"], "run", input)
  File "/home/bentoml/bento/src/service.py", line 66, in _invoke_runner
    result = getattr(runner, name).run(*input_npack)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
    return self.runner._runner_handle.run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 266, in run_method
    *args,
  File "/usr/local/lib/python3.7/site-packages/anyio/from_thread.py", line 49, in run
    return asynclib.run_async_from_thread(func, *args)
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 970, in run_async_from_thread
    return f.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 186, in async_run_method
    "Yatai-Bento-Deployment-Namespace": component_context.yatai_bento_deployment_namespace,
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 560, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 899, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/usr/local/lib/python3.7/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Expected behavior

The service does not crash.

Environment

bentoml: 1.0.15 python: 3.7.12 platform: Linux-5.15.0-1031-aws-x86_64-with-debian-11.2

aarnphm commented 1 year ago

I notice that you have a relatively old version of BentoML. We did have a fix for this issue since this version (IIRC).

Is there any specific reason why you are locked to this version?

jsuchome commented 1 year ago

Actually this is happening with 1.0.15 as well. When I was creating the issue, I switched to some old container. I'll update the report.

aarnphm commented 1 year ago

Can you provide your service definition here? You can strip out anything sensitive

jsuchome commented 1 year ago

bentofile.yaml:

service: "service:onnx_models" # Same as the argument passed to `bentoml serve`
labels:
  owner: ds-team-shipamax
  stage: dev
include:
  - "service.py" # A pattern for matching which files to include in the bento
  - "configuration.yml"
python:
  packages: # Additional pip packages required by the service
    - scipy==1.7.3
    - pandas==1.3.5
    - onnxruntime==1.13.1
    - onnx
jsuchome commented 1 year ago

save_models_for_bento.py (snippets:)

    onnx_model = keras2onnx.convert_keras(model, model.name)
    bentoml.onnx.save_model("logo_image_classifier", onnx_model)
aarnphm commented 1 year ago

Oh sorry, I mean your service.py.

jsuchome commented 1 year ago

service.py:

def fix_arg(arg):
    fixed_arg = arg
    if isspmatrix_csr(arg):
        fixed_arg = arg.toarray()
    elif isinstance(arg, list):
        fixed_arg = np.array(arg)
    return fixed_arg

def json_to_ndarray(x: dict) -> np.ndarray:
    if 'buffer' in x:
        buffer = base64.b64decode(x['buffer'].encode('ascii'))
        return np.frombuffer(buffer, x['dtype']).reshape(*x['shape'])
    if 'pandas' in x:
        return pd.read_json(x["pandas"])
    return np.array(x['data'], dtype=x['dtype'])

def ndarray_to_json(x: np.ndarray, binary: bool = True) -> dict:
    if binary:
        return {
            'buffer': base64.b64encode(x.tobytes()).decode('ascii'),
            'dtype': x.dtype.str,
            'shape': list(x.shape)
        }
    return {
        'data': x.tolist(),
        'dtype': x.dtype.str,
    }
model_names = [
    ("logo_image_classifier", bentoml.onnx),
]

models = {m.split("/")[-1]: loader.get(f"{m.split('/')[-1]}:latest").to_runner() for m, loader in model_names}

onnx_models = bentoml.Service(
    name="onnx_models",
    runners=models.values()
)

def _invoke_runner(runner, name, input: str) -> str:
    bentoml_logger.debug(f"INPUT: type {type(input)}")
    input_npack = [json_to_ndarray(x) for x in input]
    result = getattr(runner, name).run(*input_npack)
    bentoml_logger.debug(f"Result: size: {len(result)} ({type(result)})")
    return ndarray_to_json(fix_arg(result))
@onnx_models.api(
    input=JSON(),
    output=JSON(),
    route="logo_image_classifier/predict"
)
def logo_image_classifier_predict(input):
    return _invoke_runner(models["logo_image_classifier"], "run", input)
jsuchome commented 1 year ago

This service used to offer more models before, that's the reason for generating more runners, but we have narrowed the problem to onnx model

sudeepg545 commented 1 year ago

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

sauyon commented 1 year ago

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

Are you using Yatai or just deploying plain BentoML?

sudeepg545 commented 1 year ago

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

Are you using Yatai or just deploying plain BentoML?

Plain BentoML

sauyon commented 1 year ago

I wonder if circus is dying or failing to restart the runners. Do either of you have runner logs available? Maybe run with --debug?

jsuchome commented 1 year ago

bentoml.log

server log with --debug

sudeepg545 commented 1 year ago

@sauyon any update on this issue? seems like memory issue, adding some more memory on the container seems to resolve the issue, but the error description dosen't indicate memory issue and also might not be sustainable to keep adding memory.

nicjac commented 11 months ago

Just bumping this issue as we are also experiencing it in production, it leads to silent restarts which are very difficult to detect. Is there a way to force the application to completely stop in case of problem instead?

nadimintikrish commented 11 months ago

I am even facing this issue mainly on transformer models, any one had any breakthrough ?

jianshen92 commented 11 months ago

We suspect there's memory leak with aiohttp clients at some versions. @nicjac @nadimintikrish Could you help us with providing the scenario to reproduce them?

nadimintikrish commented 11 months ago

Hi @jianshen92 ! I hope this one I created earlier would help!

https://github.com/bentoml/BentoML/issues/4238

bentoml serve bento:xx works fine.

but containerizing and running the container kind of causes this issue.

b-serra commented 10 months ago

exactly same problem as here

bentoml serve works as expected. docker run --rm -p 3000:3000 <service:version> --> Error happens once the server gets a request.