bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.69k stars 616 forks source link

bug: microsoft/phi-2 hangs on macos i7 #810

Closed n-sviridenko closed 1 month ago

n-sviridenko commented 8 months ago

Describe the bug

When performing inference of microsoft/phi-2, it hangs for 5 min and then throws 500

To reproduce

  1. TRUST_REMOTE_CODE=True DTYPE=float32 openllm start microsoft/phi-2
  2. OPENLLM_ENDPOINT=http://localhost:3000 openllm query --debug 'Explain to me the difference between "further" and "farther"' --timeout 300

Logs

2023-12-23T17:40:30+0100 [ERROR] [api_server:llm-phi-service:12] Exception on /v1/generate [POST] (trace=c6e89c03e313baa951dde98d312b1a9f,span=6b29cc6ed45e3c55,sampled=1,service.name=llm-phi-service)
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openllm/_service.py", line 23, in generate_v1
return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openllm/_llm.py", line 61, in generate
if (final_result := result) is None:
^^^^^^
UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.10 python: 3.11.6 platform: macOS-11.6.8-x86_64-i386-64bit uid_gid: 501:20

pip_packages
``` abstract_singleton==1.0.1 accelerate==0.25.0 aiofiles==23.2.1 aiohttp==3.9.1 aiosignal==1.3.1 annotated-types==0.6.0 anyio==4.1.0 appdirs==1.4.4 asgiref==3.7.2 attrs==23.1.0 auto_gpt_plugin_template==0.0.3 autoflake==2.2.1 AutoGPT-Forge @ git+https://github.com/Significant-Gravitas/AutoGPT.git@f734bdb3142f42e0acd7bc2305e5583ce832e625#subdirectory=autogpts/forge backoff==2.2.1 bcrypt==4.1.1 beautifulsoup4==4.12.2 bentoml==1.1.10 bitsandbytes==0.41.3.post2 black==23.11.0 blis==0.7.11 boto3==1.33.8 botocore==1.33.8 Brotli==1.1.0 bs4==0.0.1 build==0.10.0 CacheControl==0.13.1 cachetools==5.3.2 catalogue==2.0.10 cattrs==23.1.2 certifi==2023.11.17 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.4.14 circus==0.18.0 cleo==2.1.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 colorama==0.4.6 coloredlogs==15.0.1 colorlog==6.8.0 confection==0.1.4 contextlib2==21.6.0 crashtest==0.4.1 cryptography==41.0.7 cssselect==1.2.0 cymem==2.0.8 datasets==2.16.0 deepmerge==1.1.1 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.7 distro==1.8.0 dnspython==2.4.2 docker==6.1.3 duckduckgo-search==3.8.5 dulwich==0.21.7 einops==0.7.0 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl#sha256=0964370218b7e1672a30ac50d72cdc6b16f7c867496f1d60925691188f4d2510 fastapi==0.99.1 fastcore==1.5.29 fastjsonschema==2.19.0 filelock==3.13.1 filetype==1.2.0 flatbuffers==23.5.26 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.10.0 ftfy==6.1.3 ghapi==1.0.4 google-api-core==2.14.0 google-api-python-client==2.109.0 google-auth==2.25.0 google-auth-httplib2==0.1.1 google-cloud-appengine-logging==1.3.2 google-cloud-audit-log==0.2.5 google-cloud-core==2.3.3 google-cloud-logging==3.8.0 google-cloud-storage==2.13.0 google-crc32c==1.5.0 google-resumable-media==2.6.0 googleapis-common-protos==1.61.0 greenlet==3.0.1 grpc-google-iam-v1==0.12.7 grpcio==1.59.3 grpcio-status==1.59.3 gTTS==2.4.0 h11==0.14.0 h2==4.1.0 hpack==4.0.0 httpcore==0.17.3 httplib2==0.22.0 httptools==0.6.1 httpx==0.24.1 huggingface-hub==0.19.4 humanfriendly==10.0 hypercorn==0.14.4 hyperframe==6.0.1 idna==3.6 importlib-metadata==6.11.0 importlib-resources==6.1.1 inflection==0.5.1 installer==0.7.0 isort==5.12.0 jaraco.classes==3.3.0 Jinja2==3.1.2 jmespath==1.0.1 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 keyring==24.3.0 kubernetes==28.1.0 langcodes==3.3.0 litellm==0.1.824 loguru==0.7.2 lxml==4.9.3 Markdown==3.5.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mmh3==4.0.1 monotonic==1.6 more-itertools==10.1.0 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 murmurhash==1.0.10 mypy-extensions==1.0.0 networkx==3.2.1 numpy==1.26.2 nvidia-ml-py==11.525.150 oauthlib==3.2.2 onnxruntime==1.16.3 openai==0.27.10 openapi-python-client==0.14.1 openllm==0.4.41 openllm-client==0.4.41 openllm-core==0.4.41 opentelemetry-api==1.20.0 opentelemetry-exporter-otlp-proto-common==1.21.0 opentelemetry-exporter-otlp-proto-grpc==1.21.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-instrumentation-fastapi==0.42b0 opentelemetry-proto==1.21.0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.16.1 orjson==3.9.10 outcome==1.3.0.post0 overrides==7.4.0 packaging==23.2 pandas==2.1.4 pathspec==0.11.2 pathy==0.10.3 pexpect==4.9.0 Pillow==10.1.0 pinecone-client==2.2.4 pip-requirements-parser==32.0.1 pip-tools==7.3.0 pkginfo==1.9.6 platformdirs==4.1.0 playsound==1.2.2 poetry==1.7.1 poetry-core==1.8.1 poetry-plugin-export==1.6.0 posthog==3.1.0 preshed==3.0.9 priority==2.0.0 prometheus-client==0.19.0 prompt-toolkit==3.0.41 proto-plus==1.22.3 protobuf==4.25.1 psutil==5.9.7 ptyprocess==0.7.0 pulsar-client==3.3.0 pyarrow==14.0.2 pyarrow-hotfix==0.6 pyasn1==0.5.1 pyasn1-modules==0.3.0 pycparser==2.21 pydantic==1.10.13 pydantic_core==2.14.5 pyflakes==3.1.0 PyGithub==2.1.1 Pygments==2.17.2 PyJWT==2.8.0 pylatexenc==2.10 PyNaCl==1.5.0 pyparsing==3.1.1 pypdf==3.17.1 PyPika==0.48.9 pyproject_hooks==1.0.0 PySocks==1.7.1 python-dateutil==2.8.2 python-docx==1.1.0 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 rapidfuzz==3.5.2 readability-lxml==0.8.1 redis==5.0.1 referencing==0.31.1 regex==2023.10.3 requests==2.31.0 requests-oauthlib==1.3.1 requests-toolbelt==1.0.0 rich==13.7.0 rpds-py==0.13.2 rsa==4.9 s3transfer==0.8.2 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 selenium==4.15.2 sentencepiece==0.1.99 shellingham==1.5.4 simple-di==0.1.5 six==1.16.0 smart-open==6.4.0 sniffio==1.3.0 socksio==1.0.0 sortedcontainers==2.4.0 soupsieve==2.5 spacy==3.5.4 spacy-legacy==3.0.12 spacy-loggers==1.0.5 SQLAlchemy==2.0.23 srsly==2.4.8 starlette==0.27.0 sympy==1.12 tenacity==8.2.3 thinc==8.1.12 tiktoken==0.5.2 tokenizers==0.15.0 toml==0.10.2 tomlkit==0.12.3 torch==2.1.2 tornado==6.4 tqdm==4.66.1 transformers==4.36.2 trio==0.23.1 trio-websocket==0.11.1 trove-classifiers==2023.11.29 typer==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 uritemplate==4.1.1 urllib3==2.0.7 uvicorn==0.23.2 uvloop==0.19.0 virtualenv==20.25.0 wasabi==1.1.2 watchfiles==0.21.0 wcwidth==0.2.12 webdriver-manager==4.0.1 websocket-client==1.7.0 websockets==12.0 wrapt==1.16.0 wsproto==1.2.0 xattr==0.10.1 xxhash==3.4.1 yarl==1.9.3 zipp==3.17.0 ```

System information (Optional)

macOS Big Sur Version 11.6.8 MacBook Pro (16-inch, 2019) Processor: 2.6 GHz 6-Core Intel Core i7 Memory: 16 GB 2667 MHz DDR4 Startup Disk: Macintosh HD Graphics: Intel UHD Graphics 630 1536 MB

bojiang commented 1 month ago

close for openllm 0.6