Closed bnapora closed 2 years ago
I tried suggestions in #1234 . Neither adding --timeout to docker run or including increased value in environment variable changed behavior. Still get asyncio.TimeoutError at 5:00 min of run time.
Hi @bnapora
In the log, I find
(bentoml) C:\Users\bnapora\bentoml\repository\ODRetNet_objectdect_stage1>bentoml serve ODRetNet_objectdect_stage1:latest --port 5002 --debug --enable-microbatch
also this warning
WARNING: This is a development server. Do not use it in a production deployment.
It doesn't like starting bentoml API server from a docker container.
In the official docker image, the entry point command is bentoml serve-gunicorn
rather than bentoml serve
.
The command bentoml serve
creates a simple API server provided by Flask. It is not production-ready. The behavior is not guaranteed, especially in some special cases like a large request. For real-world deployment, we may want to use bentoml serve-gunicorn
instead.
Plus the production server bentoml serve-gunicorn
only works on POSIX platforms.
@bojiang Thanks for the response. The result is the same with the gunicorn server. As you can see from the CMD line below, I'm setting the "timeout" very high. I've also tried the environment variable BENTOML__API_SERVER__DEFAULT_TIMEOUT
with no change.
Below is the Dockerfile command line:
CMD ["bentoml", "serve-gunicorn", "./", "--timeout", "5000", "--workers=1", "--debug"]
The output while running inference on a 500 MB file is below. The container is invoking gunicorn, but continues to "timeout" at approximately 5:00 minute mark. It continues to run, but doesn't pass the results back to the webservice.
Docker Container Output:
docker run -it -p 5000:5000 --gpus all --name "odretnet-detect-test01" -v D:\DataSets:/host_Data/ odretnet-mitosis-test01:latest --timeout=5000
[2021-04-07 17:43:30,026] INFO - Starting BentoML proxy in production mode..
[2021-04-07 17:43:30,026] INFO - Starting BentoML API server in production mode..
[2021-04-07 17:43:30 +0000] [21] [INFO] Starting gunicorn 20.0.4
[2021-04-07 17:43:30 +0000] [21] [INFO] Listening at: http://0.0.0.0:47153 (21)
[2021-04-07 17:43:30 +0000] [21] [INFO] Using worker: sync
[2021-04-07 17:43:30 +0000] [22] [INFO] Booting worker with pid: 22
[2021-04-07 17:43:30,242] INFO - Running micro batch service on :5000
[2021-04-07 17:43:30 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2021-04-07 17:43:30 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2021-04-07 17:43:30 +0000] [1] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2021-04-07 17:43:30 +0000] [23] [INFO] Booting worker with pid: 23
[2021-04-07 17:43:30,289] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
[2021-04-07 17:43:31,724] WARNING - pip package requirement object_detection_fastai
not found in current python environment
[2021-04-07 17:43:39,160] WARNING - BentoML by default does not include spacy and torchvision package when using PytorchModelArtifact. To make sure BentoML bundle those packages if they are required for your model, either import those packages in BentoService definition file or manually add them via @env(pip_packages=['torchvision'])
when defining a BentoService
Patch# 1 - result_boxes_cnt: 0 | 0/101 [00:00<?, ?it/s]
Patch# 2 - result_boxes_cnt: 0 | 1/101 [00:08<14:24, 8.65s/it]
Patch# 3 - result_boxes_cnt: 0 | 2/101 [00:18<15:28, 9.38s/it]
......
Patch# 31 - result_boxes_cnt: 30 | 30/101 [04:36<11:09, 9.43s/it]
Patch# 32 - result_boxes_cnt: 34 | 31/101 [04:45<10:50, 9.30s/it]
[2021-04-07 17:48:48,807] ERROR - Traceback (most recent call last): | 32/101 [04:54<10:35, 9.20s/it] File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 100, in request_dispatcher resp = await func(request) File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 282, in request_dispatcher resp = await self.relay_handler(request) File "/opt/conda/lib/python3.6/site-packages/bentoml/marshal/marshal.py", line 298, in relay_handler request.method, url, data=data, headers=request.headers File "/opt/conda/lib/python3.6/site-packages/aiohttp/client.py", line 1117, in __aenter__ self._resp = await self._coro File "/opt/conda/lib/python3.6/site-packages/aiohttp/client.py", line 544, in _request await resp.start(conn) File "/opt/conda/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 905, in start self._continue = None File "/opt/conda/lib/python3.6/site-packages/aiohttp/helpers.py", line 656, in __exit__ raise asyncio.TimeoutError from None concurrent.futures._base.TimeoutError
Patch# 33 - result_boxes_cnt: 35 Patch# 34 - result_boxes_cnt: 35▍ | 33/101 [05:03<10:19, 9.12s/it] Patch# 35 - result_boxes_cnt: 36█▎ | 34/101 [05:12<10:14, 9.17s/it] Patch# 36 - result_boxes_cnt: 39██ | 35/101 [05:21<09:59, 9.09s/it]
........
Patch# 50 - result_boxes_cnt: 200████████████▎ | 49/101 [07:31<07:58, 9.20s/it] 50%|████████████████████████████████████████ | 50/101 [07:41<07:51, 9.24s/it] [2021-04-07 17:51:30,176] INFO - {'service_name': 'ODRetNet_objectdect_stage1', 'service_version': '20210407053835_5AAD19', 'api': 'predict', 'task': {'data': '{"filename":"./WSI/2efb541724b5c017c503.svs"}', 'task_id': '164d7aa8-c76a-4214-88f0-29888da7d80a', 'http_headers': (('Host', 'localhost:5000'), ('Connection', 'keep-alive'), ('Content-Length', '45'), ('Accept', '/'), ('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.68'), ('Content-Type', 'application/json'), ('Origin', 'http://localhost:5000'), ('Sec-Fetch-Site', 'same-origin'), ('Sec-Fetch-Mode', 'cors'), ('Sec-Fetch-Dest', 'empty'), ('Referer', 'http://localhost:5000/'), ('Accept-Encoding', 'gzip, deflate, br'), ('Accept-Language', 'en-US,en;q=0.9'), ('Cookie', '_pk_id.1.1fff=1aef6d922a4cf667.1614357171.; _ga=GA1.1.2126624898.1616688679; csrftoken=h3VUxPXWaJlRdDxoaez4L9OOSlCom4GC6GyD5qJNqg8BKTNv0sM1C9KOkvYUMf0N; sessionid=bwsnoh09gj31efxzyvyqnayry3shft89; _xsrf=2|0d92a214|b5cff3615b227cfcce0abb55cd3d2355|1617143649; username-localhost-8889="2|1:0|10:1617145371|23:username-localhost-8889|44:NjMwYTU3ZjYyMTM0NDllNmE5MWUzZTEzMzUwNDI5NTA=|58a0761abdaa3a4c814a51bfa4d17a42052fa0b768d8ec9b4bde0ff9e18a9c10"; username-localhost-9090="2|1:0|10:1617228019|23:username-localhost-9090|44:ZmVlNzYxY2ViNDAxNGM4MjllOGYzZTAzYTFjZWU1MzM=|2569752bd39e53913113217b13480ad065e48aec48a4c2ae1fbda31fbeae673f"; username-localhost-8888="2|1:0|10:1617393614|23:username-localhost-8888|44:NjRkZmNiZDI3MDFkNDVhOWJlNGVlOTBhODFlYjQ5OGM=|6b479195ad7db10745cc6cbca35ab50913add4f7410061a69f93c7dffdce023b"; _pk_ses.1.1fff=1'))}, 'result': {'data': '{"2efb541724b5c017c503.svs": [[2733.0, 17584.0, 2783.0, 17644.0, 0.0, 0.689713180065155], [2716.0, 17568.0, 2774.0, 17627.0, 0.0, 0.5896227955818176], [2638.0, 17598.0, 2694.0, 17654.0, 0.0, 0.5084916353225708], [2654.0, 17612.0, 2711.0, 17668.0, 0.0, 0.3013659417629242], [2760.0, 17590.0, 2798.0, 17643.0, 0.0, 0.7467134594917297]
I still haven't been able to fix this. Moving on.
@bnapora Hi. Would you mind sharing which version of bentoml you are using? I got an idea about this issue.
@bojiang We're using v 0.12.0, and then 0.12.1
Has this issue been fixed? I am encountering the same error while using v0.13.1
I have used the "mb_max_latency" option in the @bentoml.api section as per this documentation, the timeout error was resolved for me. Hope this helps folks who are facing similar issues.
I come into situations (GPT-J, lots of tokens, CPU only) where I can trigger timeouts even with mb_max_latency=300000
& the text generation fails after <300s
I'm using 0.13.1 as well and found this very annoying. Neither setting mb_max_latency nor --timeout in docker-settings helped. I'm sending rather large images (~7MB) to the model per REST. Might this be a cause for timeouts?
Same for me - my workload typically takes quite long but I have no way of not getting BentoML not throw an error - even though I can console.log my result on the server (and the model will finish running) BentoML will reply with an error to the HTTP call
We will take a look into this for 1.0 releases. Thank you for the feedback.
@aarnphm Do you see this being able to be fixed temporarily via a configuration change to the timeout? https://github.com/bentoml/BentoML/blob/b19def038fe9ddd04602acf213ea56647deae67f/bentoml/configuration/default_configuration.yml
@davidberenstein1957 Hi, you can definitely try to configure longer timeout, but I believe that the problems stems from pip packages aren't availble within the bundle. So changing timeout might not fix the root of the issues. I hope that in our new releases this issues will be fixed.
This is now tracked here for BentoML v1.0: https://github.com/bentoml/BentoML/issues/2147
Running inference on large (500 mb) image, using Pytorch, on Windows. At 4 - 5 minute the Bentoml serve process throws a time error. It actually keeps running (displaying my print statements from predict function in console) but stops logging and does not pass the results to the webservice. (See logging output below)
Output to webservice is 500 error
Additonally, same Bento repo runs without error in an Ubuntu environment.
Environment: Windows 10, dedicated venv for bentoml (python==3.7) Docker CMD: CMD ["bentoml", "serve-gunicorn", "./", "--timeout", "5000", "--workers=1", "--debug"]
Error Log