bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.7k stars 616 forks source link

bug: Not enough data for satisfy transfer length header #960

Closed Cherchercher closed 1 month ago

Cherchercher commented 4 months ago

Describe the bug

environment: Python 10

usage: openllm start NousResearch/llama-2-13b-chat-hf

llm = OpenLLMAPI(address="http://some_address:3000/") llm.complete("What are some hazards crude oil stored in tank?")

error: aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ib.gaga/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
await func()
File "/home/ib.gaga/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
async for chunk in self.body_iterator:
File "/home/ib.gaga/.local/lib/python3.10/site-packages/openllm/_service.py", line 28, in generate_stream_v1
async for it in llm.generate_iterator(llm_model_class(input_dict).model_dump()):
File "/home/ib.gaga/.local/lib/python3.10/site-packages/openllm/_llm.py", line 127, in generate_iterator
raise RuntimeError(f'Exception caught during generation: {err}') from err
RuntimeError: Exception caught during generation: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

To reproduce

No response

Logs

No response

Environment

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11 python: 3.10.12 platform: Linux-6.5.0-1017-azure-x86_64-with-glibc2.35 uid_gid: 2206643:100

pip_packages
``` accelerate==0.29.2 aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.6.0 anyio==4.3.0 appdirs==1.4.4 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 auto_gptq==0.7.1 Automat==20.2.0 Babel==2.8.0 bcrypt==3.2.0 bentoml==1.1.11 bitsandbytes==0.41.3.post2 blinker==1.4 build==0.10.0 cattrs==23.1.2 certifi==2020.6.20 chardet==4.0.0 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloud-init==23.4.4 cloudpickle==3.0.0 cmake==3.29.2 colorama==0.4.4 coloredlogs==15.0.1 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 contextlib2==21.6.0 cryptography==3.4.8 cuda-python==12.4.0 datasets==2.18.0 dbus-python==1.2.18 deepmerge==1.1.1 Deprecated==1.2.14 dill==0.3.8 diskcache==5.6.3 distlib==0.3.8 distro==1.7.0 distro-info==1.1+ubuntu0.2 einops==0.7.0 exceptiongroup==1.2.0 fail2ban==0.11.2 fastapi==0.110.1 fastcore==1.5.29 filelock==3.13.4 filetype==1.2.0 frozenlist==1.4.1 fs==2.4.16 fsspec==2024.2.0 gekko==1.1.1 ghapi==1.0.5 h11==0.14.0 httpcore==1.0.5 httplib2==0.20.2 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.22.2 humanfriendly==10.0 hyperlink==21.0.0 idna==3.3 importlib-metadata==6.11.0 incremental==21.3.0 inflection==0.5.1 interegular==0.3.3 jeepney==0.7.1 Jinja2==3.0.3 joblib==1.4.0 jsonpatch==1.32 jsonpointer==2.0 jsonschema==3.2.0 keyring==23.5.0 lark==1.1.9 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lit==18.1.3 llvmlite==0.42.0 markdown-it-py==3.0.0 MarkupSafe==2.0.1 mdurl==0.1.2 more-itertools==8.10.0 mpmath==1.3.0 msgpack==1.0.8 multidict==6.0.5 multiprocess==0.70.16 mypy-extensions==1.0.0 nest-asyncio==1.6.0 netifaces==0.11.0 networkx==3.3 ninja==1.11.1.1 numba==0.59.1 numpy==1.26.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.5.0.96 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu11==10.2.10.91 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu11==11.7.4.91 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu11==2.14.3 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu11==11.7.91 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.0 openllm==0.4.44 openllm-client==0.4.44 openllm-core==0.4.44 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.19.0 orjson==3.10.1 outlines==0.0.34 packaging==24.0 pandas==2.2.2 pathspec==0.12.1 peft==0.10.0 pexpect==4.8.0 pillow==10.3.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.2.0 prometheus_client==0.20.0 protobuf==5.26.1 psutil==5.9.8 ptyprocess==0.7.0 py-cpuinfo==9.0.0 pyarrow==15.0.2 pyarrow-hotfix==0.6 pyasn1==0.4.8 pyasn1-modules==0.2.1 pydantic==2.7.0 pydantic_core==2.18.1 Pygments==2.17.2 PyGObject==3.42.1 PyHamcrest==2.0.2 PyJWT==2.3.0 pynvml==11.5.0 pyOpenSSL==21.0.0 pyparsing==2.4.7 pyparted==3.11.7 pyproject_hooks==1.0.0 pyrsistent==0.18.1 pyserial==3.5 python-apt==2.4.0+ubuntu3 python-dateutil==2.9.0.post0 python-debian==0.1.43+ubuntu1.1 python-dotenv==1.0.1 python-json-logger==2.0.7 python-magic==0.4.24 python-multipart==0.0.9 pytz==2022.1 PyYAML==5.4.1 pyzmq==26.0.0 ray==2.10.0 referencing==0.34.0 regex==2024.4.16 requests==2.31.0 rich==13.7.1 rouge==1.0.1 rpds-py==0.18.0 safetensors==0.4.3 schema==0.7.5 scipy==1.13.0 SecretStorage==3.3.1 sentencepiece==0.2.0 service-identity==18.1.0 simple-di==0.1.5 six==1.16.0 sniffio==1.3.1 sos==4.5.6 ssh-import-id==5.11 starlette==0.37.2 sympy==1.12 systemd-python==234 tiktoken==0.6.0 tokenizers==0.15.2 tomli==2.0.1 torch==2.1.2 tornado==6.4 tqdm==4.66.2 transformers==4.39.3 triton==2.1.0 Twisted==22.1.0 typing_extensions==4.11.0 tzdata==2024.1 ubuntu-pro-client==8001 ufw==0.36.1 unattended-upgrades==0.1 urllib3==1.26.5 uvicorn==0.29.0 uvloop==0.19.0 virtualenv==20.25.2 vllm==0.4.0.post1 wadllib==1.3.6 WALinuxAgent==2.2.46 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23.post1 xxhash==3.4.1 yarl==1.9.4 zipp==1.0.0 zope.interface==5.4.0 ```

transformers version: 4.39.3

System information (Optional)

No response

sergejzr commented 3 months ago

Hello, I get same error with the model microsoft--phi-2 Best regards Sergej

EddyJens commented 2 months ago

Same here with: meta-llama/Meta-Llama-3-8B-Instruct

bojiang commented 1 month ago

close for openllm 0.6