bentoml / Yatai

Model Deployment at Scale on Kubernetes 🦄️
https://bentoml.com
Other
789 stars 69 forks source link

Local Container bento (works) <-> Yatai build Bento (fails with async issue) #238

Closed rishin27 closed 1 year ago

rishin27 commented 2 years ago

Hi Team,

I'm trying to serve my models using the yatai server. But when i try to do inference on the endpoint it errors out with the following

API response - "An error has occurred in BentoML user code when handling this request, find the error details in server logs"

Container Logs -

/home/bentoml/bento/src/service.py:23 in predict │

                       │                                                   │
                       │   20 @svc.api(input=io.Image(), output=io.JSON()) │
                       │   21 def predict(img):                            │
                       │   22 │   img = np.asarray(exif_transpose(img))    │
                       │ ❱ 23 │   results = bl_model_runner.run(img)       │
                       │   24 │   p = results.pandas().xyxy[0]             │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/bentoml/_i │
                       │ nternal/runner/runner.py:165 in run               │
                       │                                                   │
                       │   162 │                                           │
                       │   163 │   @final                                  │
                       │   164 │   def run(self, *args: t.Any, **kwargs: t │
                       │ ❱ 165 │   │   return self._impl.run(*args, **kwar │
                       │   166 │                                           │
                       │   167 │   @final                                  │
                       │   168 │   def run_batch(self, *args: t.Any, **kwa │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/bentoml/_i │
                       │ nternal/runner/remote.py:152 in run               │
                       │                                                   │
                       │   149 │   def run(self, *args: t.Any, **kwargs: t │
                       │   150 │   │   import anyio                        │
                       │   151 │   │                                       │
                       │ ❱ 152 │   │   return anyio.from_thread.run(self.a │
                       │   153 │                                           │
                       │   154 │   def run_batch(self, *args: t.Any, **kwa │
                       │   155 │   │   import anyio                        │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/anyio/from │
                       │ _thread.py:35 in run                              │
                       │                                                   │
                       │    32 │   except AttributeError:                  │
                       │    33 │   │   raise RuntimeError('This function c │
                       │    34 │                                           │
                       │ ❱  35 │   return asynclib.run_async_from_thread(f │
                       │    36                                             │
                       │    37                                             │
                       │    38 def run_async_from_thread(func: Callable[.. │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/anyio/_bac │
                       │ kends/_asyncio.py:847 in run_async_from_thread    │
                       │                                                   │
                       │    844 ) -> T_Retval:                             │
                       │    845 │   f: concurrent.futures.Future[T_Retval] │
                       │    846 │   │   func(*args), threadlocals.loop)    │
                       │ ❱  847 │   return f.result()                      │
                       │    848                                            │
                       │    849                                            │
                       │    850 class BlockingPortal(abc.BlockingPortal):  │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/concurrent/futures/_base │
                       │ .py:445 in result                                 │
                       │                                                   │
                       │   442 │   │   │   │   if self._state in [CANCELLE │
                       │   443 │   │   │   │   │   raise CancelledError()  │
                       │   444 │   │   │   │   elif self._state == FINISHE │
                       │ ❱ 445 │   │   │   │   │   return self.__get_resul │
                       │   446 │   │   │   │   else:                       │
                       │   447 │   │   │   │   │   raise TimeoutError()    │
                       │   448 │   │   finally:                            │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/concurrent/futures/_base │
                       │ .py:390 in __get_result                           │
                       │                                                   │
                       │   387 │   def __get_result(self):                 │
                       │   388 │   │   if self._exception:                 │
                       │   389 │   │   │   try:                            │
                       │ ❱ 390 │   │   │   │   raise self._exception       │
                       │   391 │   │   │   finally:                        │
                       │   392 │   │   │   │   # Break a reference cycle w │
                       │   393 │   │   │   │   self = None                 │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/bentoml/_i │
                       │ nternal/runner/remote.py:144 in async_run         │
                       │                                                   │
                       │   141 │   │   return AutoContainer.payload_to_sin │
                       │   142 │                                           │
                       │   143 │   async def async_run(self, *args: t.Any, │
                       │ ❱ 144 │   │   return await self._async_req("run", │
                       │   145 │                                           │
                       │   146 │   async def async_run_batch(self, *args:  │
                       │   147 │   │   return await self._async_req("run_b │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/bentoml/_i │
                       │ nternal/runner/remote.py:125 in _async_req        │
                       │                                                   │
                       │   122 │   │   params = Params(*args, **kwargs).ma │
                       │   123 │   │   multipart = payload_params_to_multi │
                       │   124 │   │   client = self._get_client()         │
                       │ ❱ 125 │   │   async with client.post(f"{self._add │
                       │   126 │   │   │   body = await resp.read()        │
                       │   127 │   │   try:                                │
                       │   128 │   │   │   meta_header = resp.headers[PAYL │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/aiohttp/cl │
                       │ ient.py:1138 in __aenter__                        │
                       │                                                   │
                       │   1135 │   │   return self.__await__()            │
                       │   1136 │                                          │
                       │   1137 │   async def __aenter__(self) -> _RetType │
                       │ ❱ 1138 │   │   self._resp = await self._coro      │
                       │   1139 │   │   return self._resp                  │
                       │   1140                                            │
                       │   1141                                            │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/aiohttp/cl │
                       │ ient.py:559 in _request                           │
                       │                                                   │
                       │    556 │   │   │   │   │   │   try:               │
                       │    557 │   │   │   │   │   │   │   resp = await r │
                       │    558 │   │   │   │   │   │   │   try:           │
                       │ ❱  559 │   │   │   │   │   │   │   │   await resp │
                       │    560 │   │   │   │   │   │   │   except BaseExc │
                       │    561 │   │   │   │   │   │   │   │   resp.close │
                       │    562 │   │   │   │   │   │   │   │   raise      │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/aiohttp/cl │
                       │ ient_reqrep.py:898 in start                       │
                       │                                                   │
                       │    895 │   │   │   │   # read response            │
                       │    896 │   │   │   │   try:                       │
                       │    897 │   │   │   │   │   protocol = self._proto │
                       │ ❱  898 │   │   │   │   │   message, payload = awa │
                       │    899 │   │   │   │   except http.HttpProcessing │
                       │    900 │   │   │   │   │   raise ClientResponseEr │
                       │    901 │   │   │   │   │   │   self.request_info, │
                       │                                                   │
                       │ /opt/conda/lib/python3.9/site-packages/aiohttp/st │
                       │ reams.py:616 in read                              │
                       │                                                   │
                       │   613 │   │   │   assert not self._waiter         │
                       │   614 │   │   │   self._waiter = self._loop.creat │
                       │   615 │   │   │   try:                            │
                       │ ❱ 616 │   │   │   │   await self._waiter          │
                       │   617 │   │   │   except (asyncio.CancelledError, │
                       │   618 │   │   │   │   self._waiter = None         │
                       │   619 │   │   │   │   raise                       │
                       ╰───────────────────────────────────────────────────╯

If you see i have nowhere used any async functionality in the bento service code, but still the bento build on the yatai server fails with the above "async - ServerDisconnectedError: Server disconnected" errors.

@svc.api(input=io.Image(), output=io.JSON())
def predict(img):
    img = np.asarray(exif_transpose(img))
    results = bl_model_runner.run(img)
    p = results.pandas().xyxy[0]
    out = defaultdict(list)

If i simply run the 'bentoml containerize' command and then do a docker run, the service works without any errors.
Am i missing something ? Please impart some wisdom 🙏

Thanks

parano commented 2 years ago

Hi @rishin27 - BentoML internally uses async everywhere for better performance, so this might be an internal issue with BentoML on Yatai, that's not related to user code. May I ask what's the BentoML and Yatai version that you are using?

Note that when a Bento is deployed on Yatai, Runners are by default scheduled as their own separate pods, in order to scale separately from the service code. The async code path is used for converting the runner.run function call into an async RPC call. This might be an issue with how the deployment and runner communication was set up.

rishin27 commented 2 years ago

Hi @parano Thanks for the info. I get that yatai is build for autoscaling & kubernetes workload magic. But if the user has nowhere marked their code as async, is it right to send it by default to the async code path ? The general workflow for a data scientist will be that they'll build the model, save it using bento, check if the service is working using 'bentoml serve'... if everything works out will push it to the yatai server for the deployment. But then if yatai adds async magic (which was not intended) and thing start breaking, it'll not be a good UX.

Yatai - v0.3.1-c3dab74 bentoml - v1.0.0a7

parano commented 2 years ago

Thanks @rishin27, we will look into this issue more. Ideally by design, if the bentoml serve works locally, it should definitely work on Yatai. Note that Yatai is still in its alpha release so definitely expect some rough edges at the moment.

But if the user has nowhere marked their code as async, is it right to send it by default to the async code path ?

This is actually fairly common in python web frameworks, such as Sanic or FastAPI, where both sync and async handlers can be defined by the user, the framework uses async internally. BentoML server uses async even without Yatai. I think the root cause of the issue is likely not about async, but some settings with the distributed runner setup in yatai deployment.

rishin27 commented 2 years ago

Thanks @parano for your detailed answer. Do let me know if i can help, happy to contribute.

parano commented 2 years ago

Hi @rishin27, could you try it again with the latest version of BentoML and Yatai? The issue should be resolved by now.