bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.15k stars 792 forks source link

Timeout Error - raise asyncio.TimeoutError from None concurrent.futures._base.TimeoutError #1564

Closed bnapora closed 2 years ago

bnapora commented 3 years ago

Running inference on large (500 mb) image, using Pytorch, on Windows. At 4 - 5 minute the Bentoml serve process throws a time error. It actually keeps running (displaying my print statements from predict function in console) but stops logging and does not pass the results to the webservice. (See logging output below)

Output to webservice is 500 error

Additonally, same Bento repo runs without error in an Ubuntu environment.

Environment: Windows 10, dedicated venv for bentoml (python==3.7) Docker CMD: CMD ["bentoml", "serve-gunicorn", "./", "--timeout", "5000", "--workers=1", "--debug"]

Error Log

(bentoml) C:\Users\bnapora\bentoml\repository\ODRetNet_objectdect_stage1>bentoml serve ODRetNet_objectdect_stage1:latest --port 5002 --debug --enable-microbatch
[2021-04-02 11:36:17,507] INFO - Getting latest version ODRetNet_objectdect_stage1:20210402105755_8D6FAF
[2021-04-02 11:36:17,528] INFO - Starting BentoML API proxy in development mode..
[2021-04-02 11:36:17,633] WARNING - Using BentoML not from official PyPI release. In order to find the same version of BentoML when deploying your BentoService, you must set the 'core/bentoml_deploy_version' config to a http/git location of your BentoML fork, e.g.: 'bentoml_deploy_version = git+https://github.com/{username}/bentoml.git@{branch}'
[2021-04-02 11:36:17,645] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.0, but loading from BentoML version 0.12.0+12.g8a6d4a3
[2021-04-02 11:36:17,647] INFO - Your system nofile limit is 1024, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
======== Running on http://0.0.0.0:5002 ========
(Press CTRL+C to quit)
[2021-04-02 11:36:18,440] INFO - Starting BentoML API server in development mode..
[2021-04-02 11:36:18,636] WARNING - Using BentoML not from official PyPI release. In order to find the same version of BentoML when deploying your BentoService, you must set the 'core/bentoml_deploy_version' config to a http/git location of your BentoML fork, e.g.: 'bentoml_deploy_version = git+https://github.com/{username}/bentoml.git@{branch}'
[2021-04-02 11:36:18,654] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.12.0, but loading from BentoML version 0.12.0+12.g8a6d4a3
[2021-04-02 11:36:20,535] INFO - Detected zipimporter <zipimporter object "d:\Development\.virtualenvs\bentoml\Scripts\bentoml.exe">
[2021-04-02 11:36:20,595] WARNING - pip package requirement `object_detection_fastai` not found in current python environment
[2021-04-02 11:36:20,596] WARNING - pip package requirement `imageio` not found in current python environment
[2021-04-02 11:36:20,712] WARNING - BentoML by default does not include spacy and torchvision package when using PytorchModelArtifact. To make sure BentoML bundle those packages if they are required for your model, either import those packages in BentoService definition file or manually add them via `@env(pip_packages=['torchvision'])` when defining a BentoService
 * Serving Flask app "ODRetNet_objectdect_stage1" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:61513/ (Press CTRL+C to quit)

ERROR - Traceback (most recent call last):                                                                    | 12/101 [04:39<35:12, 23.74s/it] 
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\bentoml\marshal\marshal.py", line 100, in request_dispatcher
    resp = await func(request)
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\bentoml\marshal\marshal.py", line 282, in request_dispatcher
    resp = await self.relay_handler(request)
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\bentoml\marshal\marshal.py", line 297, in relay_handler
    request.method, url, data=data, headers=request.headers
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\aiohttp\client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\aiohttp\client.py", line 544, in _request
    await resp.start(conn)
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\aiohttp\client_reqrep.py", line 905, in start
    self._continue = None
  File "d:\development\.virtualenvs\bentoml\lib\site-packages\aiohttp\helpers.py", line 656, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

Patch# 13 - result_boxes_cnt: 6
                                                                                                                                                                        Patch# 14 - result_boxes_cnt: 6                                                                                                         | 13/101 [05:02<34:52, 23.78s/it] 
                                                                                                                                                                        Patch# 15 - result_boxes_cnt: 6                                                                                                         | 14/101 [05:26<34:35, 23.86s/it] 
                                                                                                                                                                        Patch# 16 - result_boxes_cnt: 6                                                                                                         | 15/101 [05:53<35:22, 24.68s/it] 
                                                                                                                                                                        Patch# 17 - result_boxes_cnt: 7                                                                                                         | 16/101 [06:19<35:37, 25.14s/it] 
                                                                                                                                                                        Patch# 18 - result_boxes_cnt: 7                                                                                                         | 17/101 [06:46<35:39, 25.47s/it] 
                                                                                                                                                                        Patch# 19 - result_boxes_cnt: 7                                                                                                         | 18/101 [07:12<35:39, 25.78s/it] 
                                                                                                                                                                        Patch# 20 - result_boxes_cnt: 7                                                                                                         | 19/101 [07:39<35:32, 26.00s/it] 
 20%|█████████████████████████▋                                                                                                        | 20/101 [08:05<32:46, 24.28s/it]
[2021-04-02 11:44:38,269] INFO - {'service_name': 'ODRetNet_objectdect_stage1', 'service_version': '20210402105755_8D6FAF', 'api': 'predict', 'task': {'data': '{"filename":"D:\\\\DataSets\\\\MITOS_WSI_CCMCT\\\\WSI\\\\2efb541724b5c017c503.svs"}', 'task_id': '7f8098e2-6628-48a1-b536-7338c02efbed', 'http_headers': (('Host', 'localhost:5002'), ('Connection', 'keep-alive'), ('Content-Length', '75'), ('Accept', '*/*'), ('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.63'), ('Content-Type', 'application/json'), ('Origin', 'http://localhost:5002'), ('Sec-Fetch-Site', 'same-origin'), ('Sec-Fetch-Mode', 'cors'), ('Sec-Fetch-Dest', 'empty'), ('Referer', 'http://localhost:5002/'), ('Accept-Encoding', 'gzip, deflate, br'), ('Accept-Language', 'en-US,en;q=0.9'), ('Cookie', '_pk_id.1.1fff=1aef6d922a4cf667.1614357171.; _ga=GA1.1.2126624898.1616688679; csrftoken=h3VUxPXWaJlRdDxoaez4L9OOSlCom4GC6GyD5qJNqg8BKTNv0sM1C9KOkvYUMf0N; sessionid=bwsnoh09gj31efxzyvyqnayry3shft89; _xsrf=2|0d92a214|b5cff3615b227cfcce0abb55cd3d2355|1617143649; username-localhost-8889="2|1:0|10:1617145371|23:username-localhost-8889|44:NjMwYTU3ZjYyMTM0NDllNmE5MWUzZTEzMzUwNDI5NTA=|58a0761abdaa3a4c814a51bfa4d17a42052fa0b768d8ec9b4bde0ff9e18a9c10"; username-localhost-9090="2|1:0|10:1617228019|23:username-localhost-9090|44:ZmVlNzYxY2ViNDAxNGM4MjllOGYzZTAzYTFjZWU1MzM=|2569752bd39e53913113217b13480ad065e48aec48a4c2ae1fbda31fbeae673f"; username-localhost-8888="2|1:0|10:1617387614|23:username-localhost-8888|44:YTI0ODNmOGE0MWFkNDNkZDg5YzVlM2Q4M2VjNTcyZTM=|61d78347a0692c3d7b5c0ce923712db0a41d75284e47b47584987e47c7e3af9a"'))}, 'result': {'data': '{"2efb541724b5c017c503.svs": [[2733.0, 17584.0, 2783.0, 17644.0, 0.0, 0.689713180065155], [2716.0, 17568.0, 2774.0, 17627.0, 0.0, 0.5896230340003967], [2638.0, 17598.0, 2694.0, 17654.0, 0.0, 0.5084918141365051], [2654.0, 17612.0, 2711.0, 17668.0, 0.0, 0.30136606097221375], [2760.0, 17590.0, 2798.0, 17643.0, 0.0, 0.7467142343521118], [2767.0, 17602.0, 2817.0, 17654.0, 0.0, 0.3130497336387634], [3890.0, 70860.0, 3933.0, 70911.0, 0.0, 0.31610095500946045]]}', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': '7f8098e2-6628-48a1-b536-7338c02efbed'}
127.0.0.1 - - [02/Apr/2021 11:44:38] "POST /predict HTTP/1.1" 200 -
bnapora commented 3 years ago

I tried suggestions in #1234 . Neither adding --timeout to docker run or including increased value in environment variable changed behavior. Still get asyncio.TimeoutError at 5:00 min of run time.

bojiang commented 3 years ago

Hi @bnapora

In the log, I find

(bentoml) C:\Users\bnapora\bentoml\repository\ODRetNet_objectdect_stage1>bentoml serve ODRetNet_objectdect_stage1:latest --port 5002 --debug --enable-microbatch

also this warning

WARNING: This is a development server. Do not use it in a production deployment.

It doesn't like starting bentoml API server from a docker container. In the official docker image, the entry point command is bentoml serve-gunicorn rather than bentoml serve.

The command bentoml serve creates a simple API server provided by Flask. It is not production-ready. The behavior is not guaranteed, especially in some special cases like a large request. For real-world deployment, we may want to use bentoml serve-gunicorn instead.

Plus the production server bentoml serve-gunicorn only works on POSIX platforms.

bnapora commented 3 years ago

@bojiang Thanks for the response. The result is the same with the gunicorn server. As you can see from the CMD line below, I'm setting the "timeout" very high. I've also tried the environment variable BENTOML__API_SERVER__DEFAULT_TIMEOUT with no change.

Below is the Dockerfile command line: CMD ["bentoml", "serve-gunicorn", "./", "--timeout", "5000", "--workers=1", "--debug"]

The output while running inference on a 500 MB file is below. The container is invoking gunicorn, but continues to "timeout" at approximately 5:00 minute mark. It continues to run, but doesn't pass the results back to the webservice.

Docker Container Output: docker run -it -p 5000:5000 --gpus all --name "odretnet-detect-test01" -v D:\DataSets:/host_Data/ odretnet-mitosis-test01:latest --timeout=5000 [2021-04-07 17:43:30,026] INFO - Starting BentoML proxy in production mode.. [2021-04-07 17:43:30,026] INFO - Starting BentoML API server in production mode.. [2021-04-07 17:43:30 +0000] [21] [INFO] Starting gunicorn 20.0.4 [2021-04-07 17:43:30 +0000] [21] [INFO] Listening at: http://0.0.0.0:47153 (21) [2021-04-07 17:43:30 +0000] [21] [INFO] Using worker: sync [2021-04-07 17:43:30 +0000] [22] [INFO] Booting worker with pid: 22 [2021-04-07 17:43:30,242] INFO - Running micro batch service on :5000 [2021-04-07 17:43:30 +0000] [1] [INFO] Starting gunicorn 20.0.4 [2021-04-07 17:43:30 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1) [2021-04-07 17:43:30 +0000] [1] [INFO] Using worker: aiohttp.worker.GunicornWebWorker [2021-04-07 17:43:30 +0000] [23] [INFO] Booting worker with pid: 23 [2021-04-07 17:43:30,289] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection. [2021-04-07 17:43:31,724] WARNING - pip package requirement object_detection_fastai not found in current python environment [2021-04-07 17:43:39,160] WARNING - BentoML by default does not include spacy and torchvision package when using PytorchModelArtifact. To make sure BentoML bundle those packages if they are required for your model, either import those packages in BentoService definition file or manually add them via @env(pip_packages=['torchvision']) when defining a BentoService Patch# 1 - result_boxes_cnt: 0 | 0/101 [00:00<?, ?it/s] Patch# 2 - result_boxes_cnt: 0 | 1/101 [00:08<14:24, 8.65s/it] Patch# 3 - result_boxes_cnt: 0 | 2/101 [00:18<15:28, 9.38s/it]

......

Patch# 31 - result_boxes_cnt: 30 | 30/101 [04:36<11:09, 9.43s/it] Patch# 32 - result_boxes_cnt: 34 | 31/101 [04:45<10:50, 9.30s/it] [2021-04-07 17:48:48,807] ERROR - Traceback (most recent call last): | 32/101 [04:54<10:35, 9.20s/it] File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 100, in request_dispatcher resp = await func(request) File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 282, in request_dispatcher resp = await self.relay_handler(request) File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 298, in relay_handler request.method, url, data=data, headers=request.headers File "/opt/conda/lib/python3.6/site-packages/aiohttp/client.py", line 1117, in __aenter__ self._resp = await self._coro File "/opt/conda/lib/python3.6/site-packages/aiohttp/client.py", line 544, in _request await resp.start(conn) File "/opt/conda/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 905, in start self._continue = None File "/opt/conda/lib/python3.6/site-packages/aiohttp/helpers.py", line 656, in __exit__ raise asyncio.TimeoutError from None concurrent.futures._base.TimeoutError

Patch# 33 - result_boxes_cnt: 35 Patch# 34 - result_boxes_cnt: 35▍ | 33/101 [05:03<10:19, 9.12s/it] Patch# 35 - result_boxes_cnt: 36█▎ | 34/101 [05:12<10:14, 9.17s/it] Patch# 36 - result_boxes_cnt: 39██ | 35/101 [05:21<09:59, 9.09s/it]

........

Patch# 50 - result_boxes_cnt: 200████████████▎ | 49/101 [07:31<07:58, 9.20s/it] 50%|████████████████████████████████████████ | 50/101 [07:41<07:51, 9.24s/it] [2021-04-07 17:51:30,176] INFO - {'service_name': 'ODRetNet_objectdect_stage1', 'service_version': '20210407053835_5AAD19', 'api': 'predict', 'task': {'data': '{"filename":"./WSI/2efb541724b5c017c503.svs"}', 'task_id': '164d7aa8-c76a-4214-88f0-29888da7d80a', 'http_headers': (('Host', 'localhost:5000'), ('Connection', 'keep-alive'), ('Content-Length', '45'), ('Accept', '/'), ('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.68'), ('Content-Type', 'application/json'), ('Origin', 'http://localhost:5000'), ('Sec-Fetch-Site', 'same-origin'), ('Sec-Fetch-Mode', 'cors'), ('Sec-Fetch-Dest', 'empty'), ('Referer', 'http://localhost:5000/'), ('Accept-Encoding', 'gzip, deflate, br'), ('Accept-Language', 'en-US,en;q=0.9'), ('Cookie', '_pk_id.1.1fff=1aef6d922a4cf667.1614357171.; _ga=GA1.1.2126624898.1616688679; csrftoken=h3VUxPXWaJlRdDxoaez4L9OOSlCom4GC6GyD5qJNqg8BKTNv0sM1C9KOkvYUMf0N; sessionid=bwsnoh09gj31efxzyvyqnayry3shft89; _xsrf=2|0d92a214|b5cff3615b227cfcce0abb55cd3d2355|1617143649; username-localhost-8889="2|1:0|10:1617145371|23:username-localhost-8889|44:NjMwYTU3ZjYyMTM0NDllNmE5MWUzZTEzMzUwNDI5NTA=|58a0761abdaa3a4c814a51bfa4d17a42052fa0b768d8ec9b4bde0ff9e18a9c10"; username-localhost-9090="2|1:0|10:1617228019|23:username-localhost-9090|44:ZmVlNzYxY2ViNDAxNGM4MjllOGYzZTAzYTFjZWU1MzM=|2569752bd39e53913113217b13480ad065e48aec48a4c2ae1fbda31fbeae673f"; username-localhost-8888="2|1:0|10:1617393614|23:username-localhost-8888|44:NjRkZmNiZDI3MDFkNDVhOWJlNGVlOTBhODFlYjQ5OGM=|6b479195ad7db10745cc6cbca35ab50913add4f7410061a69f93c7dffdce023b"; _pk_ses.1.1fff=1'))}, 'result': {'data': '{"2efb541724b5c017c503.svs": [[2733.0, 17584.0, 2783.0, 17644.0, 0.0, 0.689713180065155], [2716.0, 17568.0, 2774.0, 17627.0, 0.0, 0.5896227955818176], [2638.0, 17598.0, 2694.0, 17654.0, 0.0, 0.5084916353225708], [2654.0, 17612.0, 2711.0, 17668.0, 0.0, 0.3013659417629242], [2760.0, 17590.0, 2798.0, 17643.0, 0.0, 0.7467134594917297]

bnapora commented 3 years ago

I still haven't been able to fix this. Moving on.

bojiang commented 3 years ago

@bnapora Hi. Would you mind sharing which version of bentoml you are using? I got an idea about this issue.

bnapora commented 3 years ago

@bojiang We're using v 0.12.0, and then 0.12.1

aravind-sundaresan commented 3 years ago

Has this issue been fixed? I am encountering the same error while using v0.13.1

vishnupriyavr commented 3 years ago

I have used the "mb_max_latency" option in the @bentoml.api section as per this documentation, the timeout error was resolved for me. Hope this helps folks who are facing similar issues.

chris-aeviator commented 3 years ago

I come into situations (GPT-J, lots of tokens, CPU only) where I can trigger timeouts even with mb_max_latency=300000 & the text generation fails after <300s

stvogel commented 3 years ago

I'm using 0.13.1 as well and found this very annoying. Neither setting mb_max_latency nor --timeout in docker-settings helped. I'm sending rather large images (~7MB) to the model per REST. Might this be a cause for timeouts?

chris-aeviator commented 3 years ago

Same for me - my workload typically takes quite long but I have no way of not getting BentoML not throw an error - even though I can console.log my result on the server (and the model will finish running) BentoML will reply with an error to the HTTP call

aarnphm commented 3 years ago

We will take a look into this for 1.0 releases. Thank you for the feedback.

davidberenstein1957 commented 2 years ago

@aarnphm Do you see this being able to be fixed temporarily via a configuration change to the timeout? https://github.com/bentoml/BentoML/blob/b19def038fe9ddd04602acf213ea56647deae67f/bentoml/configuration/default_configuration.yml

aarnphm commented 2 years ago

@davidberenstein1957 Hi, you can definitely try to configure longer timeout, but I believe that the problems stems from pip packages aren't availble within the bundle. So changing timeout might not fix the root of the issues. I hope that in our new releases this issues will be fixed.

parano commented 2 years ago

This is now tracked here for BentoML v1.0: https://github.com/bentoml/BentoML/issues/2147