bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
https://bentoml.com
Apache License 2.0
7.06k stars 782 forks source link

bug: error in deploying in production mode (while it is running okay with --deployment) #4089

Open flavourabbit opened 1 year ago

flavourabbit commented 1 year ago

Describe the bug

I implemented a Transformers based project using Huggingface module. Although it runs okay with --deployment bentoml serve service:svc --development --reload, BentoML spits out error when I try to inference, specifically bentoml serve service:svc --development --reload

To reproduce

As this is a confidential project, I can't share the code snippet.

However, the error log says

bentoml/src/bentoml/_internal/runner/runner_handle/remote.py", line 244, in async_run_method
    raise RemoteException(
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner owlvit_runner: [404] Not Found

I altered the bentoml/_internal/runner/runner_handle/remote.py to print out the detail.

        try:
            print(resp)
            content_type = resp.headers["Content-Type"]
            print('content_type', content_type)
            assert content_type.lower().startswith("application/vnd.bentoml.")    # <- this is making the exception

Then, it shows as following,

<ClientResponse(http://127.0.0.1:8000/predict) [404 Not Found]>
<CIMultiDictProxy('Date': 'Tue, 01 Aug 2023 09:55:27 GMT', 'Server': 'uvicorn', 'Content-Length': '9', 'Content-Type': 'text/plain; charset=utf-8')>
content_type text/plain; charset=utf-8

1) I wonder why ClientResponse is looking into http://127.0.0.1:8000/predict although I sent post request to 'http://a.b.c.d:3000/predict_image'

2) It is odd that the API is working in development mode

I've searched the Internet for debugging this issue but no luck.

Expected behavior

Run smoothly as same way in development mode.

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.0.post7+g1ed3e6ff.d20230801 python: 3.10.6 platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35 uid_gid: 1000:1000

pip_packages
``` aiofiles==23.1.0 aiohttp==3.8.4 aiosignal==1.3.1 altair==5.0.1 anyio==3.7.0 appdirs==1.4.4 apturl==0.5.2 asgiref==3.5.0 async-timeout==4.0.2 attrs==23.1.0 bcrypt==3.2.0 -e git+https://github.com/bentoml/bentoml.git@1ed3e6ffd9fcc2daa0097782996f472bf5acb9c3#egg=bentoml blinker==1.4 Brlapi==0.8.3 build==0.10.0 cattrs==23.1.2 certifi==2020.6.20 chardet==4.0.0 charset-normalizer==3.1.0 circus==0.18.0 click==8.0.3 click-option-group==0.5.6 cloudpickle==2.2.1 cmake==3.26.4 colorama==0.4.4 command-not-found==0.3 configobj==5.0.6 contextlib2==21.6.0 contourpy==1.1.0 cryptography==3.4.8 cupshelpers==1.0 cycler==0.11.0 dbus-python==1.2.18 deepmerge==1.1.0 defer==1.0.6 Deprecated==1.2.14 dill==0.3.6 distlib==0.3.7 distro==1.7.0 distro-info===1.1build1 dnspython==2.1.0 duplicity==0.8.21 exceptiongroup==1.1.1 fastapi==0.98.0 fasteners==0.14.1 ffmpy==0.3.0 filelock==3.12.2 fonttools==4.40.0 frozenlist==1.3.3 fs==2.4.16 fsspec==2023.6.0 ftfy==6.1.1 future==0.18.2 gpg===1.16.0-unknown gradio==3.35.2 gradio_client==0.2.7 h11==0.14.0 httpcore==0.17.2 httplib2==0.20.2 httpx==0.24.1 huggingface-hub==0.15.1 idna==3.3 importlib-metadata==6.0.1 iotop==0.6 jeepney==0.7.1 Jinja2==3.1.2 jsonschema==4.17.3 keyring==23.5.0 kiwisolver==1.4.4 language-selector==0.1 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 linkify-it-py==2.0.2 lit==16.0.6 lockfile==0.12.2 loguru==0.7.0 louis==3.20.0 macaroonbakery==1.3.1 Mako==1.1.3 Markdown==3.3.6 markdown-it-py==2.2.0 MarkupSafe==2.0.1 matplotlib==3.7.1 mdit-py-plugins==0.3.3 mdurl==0.1.2 monotonic==1.6 more-itertools==8.10.0 mpmath==1.3.0 multidict==6.0.4 netifaces==0.11.0 networkx==3.1 numpy==1.25.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.2.0 olefile==0.46 opencv-python==4.7.0.72 opentelemetry-api==1.18.0 opentelemetry-instrumentation==0.39b0 opentelemetry-instrumentation-aiohttp-client==0.39b0 opentelemetry-instrumentation-asgi==0.39b0 opentelemetry-sdk==1.18.0 opentelemetry-semantic-conventions==0.39b0 opentelemetry-util-http==0.39b0 orjson==3.9.1 packaging==23.1 pandas==2.0.3 paramiko==2.9.3 pascal-voc-writer==0.1.4 pathspec==0.11.2 pexpect==4.8.0 Pillow==9.0.1 pip-requirements-parser==32.0.1 pip-tools==7.1.0 platformdirs==3.10.0 prometheus-client==0.17.1 protobuf==3.12.4 psutil==5.9.0 ptyprocess==0.7.0 pycairo==1.20.1 pycups==2.0.1 pydantic==1.10.9 pydub==0.25.1 Pygments==2.15.1 PyGObject==3.42.1 PyJWT==2.3.0 pymacaroons==0.13.0 PyNaCl==1.5.0 pynvml==11.5.0 pyparsing==2.4.7 pyproject_hooks==1.0.0 pyRFC3339==1.1 pyrsistent==0.19.3 python-apt==2.4.0+ubuntu1 python-dateutil==2.8.2 python-debian===0.1.43ubuntu1 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2022.1 pyxdg==0.27 PyYAML==5.4.1 pyzmq==25.1.0 regex==2023.6.3 reportlab==3.6.8 requests==2.25.1 requests-toolbelt==0.9.1 rich==13.5.0 safetensors==0.3.1 schema==0.7.5 scipy==1.11.1 screen-resolution-extra==0.0.0 seaborn==0.12.2 SecretStorage==3.3.1 semantic-version==2.10.0 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 ssh-import-id==5.11 starlette==0.27.0 sympy==1.12 systemd-python==234 terminator==2.1.1 tokenizers==0.13.3 tomli==2.0.1 toolz==0.12.0 torch==2.0.1 torchvision==0.15.2 tornado==6.3.2 tqdm==4.65.0 transformers @ file://some/path triton==2.0.0 typing_extensions==4.7.0 tzdata==2023.3 ubuntu-advantage-tools==8001 ubuntu-drivers-common==0.0.0 uc-micro-py==1.0.2 ufw==0.36.1 ultralytics==8.0.120 unattended-upgrades==0.1 urllib3==1.26.5 usb-creator==0.3.7 uvicorn==0.22.0 virtualenv==20.24.2 wadllib==1.3.6 watchfiles==0.19.0 wcwidth==0.2.6 websockets==11.0.3 wrapt==1.15.0 wsproto==1.0.0 xdg==5 xkit==0.0.0 yarl==1.9.2 zipp==1.0.0 ```
bojiang commented 1 year ago

@flavourabbit The 8000 is the port of runner, and predict means you are calling runner by runner.predict.run

bojiang commented 1 year ago

The thing is, it seems that the runner doesn't have predict method on it. How did you create the runner?