bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.08k stars 639 forks source link

Failed to load dolly_v2 #310

Closed npuichigo closed 1 year ago

npuichigo commented 1 year ago

Describe the bug

openllm start dolly-v2

2023-09-09T14:31:35+0800 [ERROR] [runner:llm-dolly-v2-runner:1] Traceback (most recent call last):
  File "/home/ichigo/miniconda3/envs/bento/lib/python3.10/site-packages/openllm/_llm.py", line 758, in model
    model = model.to('cuda')
AttributeError: 'InstructionTextGenerationPipeline' object has no attribute 'to'

To reproduce

No response

Logs

No response

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.6 python: 3.10.12 platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 uid_gid: 1000:1000 conda: 23.3.1 in_conda_env: True

conda_packages
```yaml name: bento channels: - defaults dependencies: - _libgcc_mutex=0.1=main - _openmp_mutex=5.1=1_gnu - bzip2=1.0.8=h7b6447c_0 - ca-certificates=2023.08.22=h06a4308_0 - ld_impl_linux-64=2.38=h1181459_1 - libffi=3.4.4=h6a678d5_0 - libgcc-ng=11.2.0=h1234567_1 - libgomp=11.2.0=h1234567_1 - libstdcxx-ng=11.2.0=h1234567_1 - libuuid=1.41.5=h5eee18b_0 - ncurses=6.4=h6a678d5_0 - openssl=3.0.10=h7f8727e_2 - pip=23.2.1=py310h06a4308_0 - python=3.10.12=h955ad1f_0 - readline=8.2=h5eee18b_0 - setuptools=68.0.0=py310h06a4308_0 - sqlite=3.41.2=h5eee18b_0 - tk=8.6.12=h1ccaba5_0 - tzdata=2023c=h04d1e81_0 - wheel=0.38.4=py310h06a4308_0 - xz=5.4.2=h5eee18b_0 - zlib=1.2.13=h5eee18b_0 - pip: - accelerate==0.22.0 - anyio==3.7.1 - appdirs==1.4.4 - asgiref==3.7.2 - attrs==23.1.0 - backoff==2.2.1 - bentoml==1.1.6 - bitsandbytes==0.41.1 - boto3==1.28.43 - botocore==1.31.43 - build==1.0.3 - cattrs==23.1.2 - circus==0.18.0 - click==8.1.7 - click-option-group==0.5.6 - cloudpickle==2.2.1 - cmake==3.27.4.1 - coloredlogs==15.0.1 - contextlib2==21.6.0 - cpm-kernels==1.0.11 - cuda-python==12.2.0 - cython==3.0.2 - deepmerge==1.1.0 - deprecated==1.2.14 - exceptiongroup==1.1.3 - fastapi==0.103.1 - fastcore==1.5.29 - filetype==1.2.0 - fs==2.4.16 - fs-s3fs==1.1.1 - ghapi==1.0.4 - googleapis-common-protos==1.56.2 - grpcio==1.58.0 - grpcio-channelz==1.48.2 - grpcio-health-checking==1.48.2 - grpcio-reflection==1.48.2 - h11==0.14.0 - httpcore==0.17.3 - httpx==0.24.1 - huggingface-hub==0.16.4 - humanfriendly==10.0 - importlib-metadata==6.0.1 - inflection==0.5.1 - jinja2==3.1.2 - jmespath==1.0.1 - jsonschema==4.19.0 - jsonschema-specifications==2023.7.1 - lit==16.0.6 - markdown-it-py==3.0.0 - markupsafe==2.1.3 - mdurl==0.1.2 - mpmath==1.3.0 - mypy-extensions==1.0.0 - networkx==3.1 - ninja==1.11.1 - numba==0.57.1 - numpy==1.24.4 - nvidia-cublas-cu11==11.10.3.66 - nvidia-cuda-cupti-cu11==11.7.101 - nvidia-cuda-nvrtc-cu11==11.7.99 - nvidia-cuda-runtime-cu11==11.7.99 - nvidia-cudnn-cu11==8.5.0.96 - nvidia-cufft-cu11==10.9.0.58 - nvidia-curand-cu11==10.2.10.91 - nvidia-cusolver-cu11==11.4.0.1 - nvidia-cusparse-cu11==11.7.4.91 - nvidia-nccl-cu11==2.14.3 - nvidia-nvtx-cu11==11.7.91 - openllm==0.3.3 - openllm-client==0.3.3 - openllm-core==0.3.3 - opentelemetry-api==1.18.0 - opentelemetry-exporter-jaeger==1.18.0 - opentelemetry-exporter-jaeger-proto-grpc==1.18.0 - opentelemetry-exporter-jaeger-thrift==1.18.0 - opentelemetry-exporter-otlp==1.18.0 - opentelemetry-exporter-otlp-proto-common==1.18.0 - opentelemetry-exporter-otlp-proto-grpc==1.18.0 - opentelemetry-exporter-otlp-proto-http==1.18.0 - opentelemetry-exporter-zipkin==1.18.0 - opentelemetry-exporter-zipkin-json==1.18.0 - opentelemetry-exporter-zipkin-proto-http==1.18.0 - opentelemetry-instrumentation==0.39b0 - opentelemetry-instrumentation-aiohttp-client==0.39b0 - opentelemetry-instrumentation-asgi==0.39b0 - opentelemetry-instrumentation-grpc==0.39b0 - opentelemetry-proto==1.18.0 - opentelemetry-sdk==1.18.0 - opentelemetry-semantic-conventions==0.39b0 - opentelemetry-util-http==0.39b0 - optimum==1.13.0 - orjson==3.9.6 - packaging==23.1 - pathspec==0.11.2 - pillow==10.0.0 - pip-requirements-parser==32.0.1 - pip-tools==7.3.0 - prometheus-client==0.17.1 - protobuf==3.20.3 - psutil==5.9.5 - pydantic==1.10.12 - pynvml==11.5.0 - pyparsing==3.1.1 - pyproject-hooks==1.0.0 - python-json-logger==2.0.7 - python-multipart==0.0.6 - pytz==2023.3.post1 - pyyaml==6.0.1 - pyzmq==25.1.1 - ray==2.6.3 - referencing==0.30.2 - regex==2023.8.8 - rich==13.5.2 - rpds-py==0.10.2 - s3transfer==0.6.2 - safetensors==0.3.3 - schema==0.7.5 - scipy==1.11.2 - sentencepiece==0.1.99 - simple-di==0.1.5 - six==1.16.0 - sniffio==1.3.0 - starlette==0.27.0 - sympy==1.12 - tabulate==0.9.0 - thrift==0.16.0 - tokenizers==0.13.3 - tomli==2.0.1 - torch==2.0.1 - tornado==6.3.3 - transformers==4.33.1 - triton==2.0.0 - urllib3==1.26.16 - uvicorn==0.23.2 - vllm==0.1.6 - watchfiles==0.20.0 - wrapt==1.15.0 - xformers==0.0.21 - zipp==3.16.2 prefix: /home/ichigo/miniconda3/envs/bento ```
pip_packages
``` accelerate==0.22.0 aiohttp==3.8.4 aiosignal==1.3.1 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 asttokens==2.2.1 async-timeout==4.0.2 attrs==23.1.0 audioread==3.0.0 backcall==0.2.0 backoff==2.2.1 bentoml==1.1.6 bitsandbytes==0.41.1 boto3==1.28.43 botocore==1.31.43 build==1.0.3 cattrs==23.1.2 certifi==2023.5.7 cffi==1.15.1 charset-normalizer==3.1.0 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==2.2.1 cmake==3.27.4.1 coloredlogs==15.0.1 contextlib2==21.6.0 cpm-kernels==1.0.11 cuda-python==12.2.0 Cython==3.0.2 datasets==2.12.0 decorator==5.1.1 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.6 exceptiongroup==1.1.3 executing==1.2.0 fastapi==0.103.1 fastcore==1.5.29 filelock==3.12.0 filetype==1.2.0 frozenlist==1.3.3 fs==2.4.16 fs-s3fs==1.1.1 fsspec==2023.5.0 ghapi==1.0.4 googleapis-common-protos==1.56.2 grpcio==1.58.0 grpcio-channelz==1.48.2 grpcio-health-checking==1.48.2 grpcio-reflection==1.48.2 h11==0.14.0 httpcore==0.17.3 httpx==0.24.1 huggingface-hub==0.16.4 humanfriendly==10.0 idna==3.4 importlib-metadata==6.0.1 inflection==0.5.1 jedi==0.18.2 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.2.0 jsonschema==4.19.0 jsonschema-specifications==2023.7.1 lazy_loader==0.2 librosa==0.10.0.post2 lit==16.0.6 llvmlite==0.40.0 markdown-it-py==3.0.0 MarkupSafe==2.1.3 matplotlib-inline==0.1.6 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.5 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 networkx==3.1 ninja==1.11.1 numba==0.57.1 numpy==1.24.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 openllm==0.3.3 openllm-client==0.3.3 openllm-core==0.3.3 opentelemetry-api==1.18.0 opentelemetry-exporter-jaeger==1.18.0 opentelemetry-exporter-jaeger-proto-grpc==1.18.0 opentelemetry-exporter-jaeger-thrift==1.18.0 opentelemetry-exporter-otlp==1.18.0 opentelemetry-exporter-otlp-proto-common==1.18.0 opentelemetry-exporter-otlp-proto-grpc==1.18.0 opentelemetry-exporter-otlp-proto-http==1.18.0 opentelemetry-exporter-zipkin==1.18.0 opentelemetry-exporter-zipkin-json==1.18.0 opentelemetry-exporter-zipkin-proto-http==1.18.0 opentelemetry-instrumentation==0.39b0 opentelemetry-instrumentation-aiohttp-client==0.39b0 opentelemetry-instrumentation-asgi==0.39b0 opentelemetry-instrumentation-grpc==0.39b0 opentelemetry-proto==1.18.0 opentelemetry-sdk==1.18.0 opentelemetry-semantic-conventions==0.39b0 opentelemetry-util-http==0.39b0 optimum==1.13.0 orjson==3.9.6 packaging==23.1 pandas==2.0.1 parso==0.8.3 pathspec==0.11.2 pexpect==4.8.0 pickleshare==0.7.5 Pillow==10.0.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 pooch==1.6.0 prometheus-client==0.17.1 prompt-toolkit==3.0.38 protobuf==3.20.3 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==12.0.0 pycparser==2.21 pydantic==1.10.12 Pygments==2.15.1 pynvml==11.5.0 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.1 ray==2.6.3 referencing==0.30.2 regex==2023.8.8 requests==2.30.0 responses==0.18.0 rich==13.5.2 rpds-py==0.10.2 s3transfer==0.6.2 safetensors==0.3.3 schema==0.7.5 scikit-learn==1.2.2 scipy==1.11.2 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 soundfile==0.12.1 soxr==0.3.5 stack-data==0.6.2 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 threadpoolctl==3.1.0 thrift==0.16.0 tokenizers==0.13.3 tomli==2.0.1 torch==2.0.1 tornado==6.3.3 tqdm==4.65.0 traitlets==5.9.0 transformers==4.33.1 triton==2.0.0 typing_extensions==4.5.0 tzdata==2023.3 urllib3==1.26.16 uvicorn==0.23.2 vllm==0.1.6 watchfiles==0.20.0 wcwidth==0.2.6 wrapt==1.15.0 xformers==0.0.21 xmltodict==0.13.0 xxhash==3.2.0 yarl==1.9.2 zipp==3.16.2 ```

System information (Optional)

No response

aarnphm commented 1 year ago

hey ths for reporting this.

Can you try out openllm start gpt-neox --model-id databricks/dolly-v2-3b --serialisation legacy?

npuichigo commented 1 year ago

I got this error for gpt-neox

File "/home/ichigo/miniconda3/envs/bento/lib/python3.10/site-packages/openllm/_llm.py", line 472, in from_pretrained
    raise OpenLLMException(f"Failed to generate a valid tag for {cfg_cls.__openllm_start_name__} with 'model_id={_model_id}' (lookup to see its traceback):\n{err}") from err
openllm_core.exceptions.OpenLLMException: Failed to generate a valid tag for gpt-neox with 'model_id=databricks/dolly-v2-3b' (lookup to see its traceback):
We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like databricks/dolly-v2-3b is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
aarnphm commented 1 year ago

hmm, weird.

andreamaestri1999 commented 1 year ago

I had the same problem, and the command you sent openllm start gpt-neox --model-id databricks/dolly-v2-3b --serialisation legacy gets stuck in fetching 12 files after downloading. I also tried databricks/dolly-v2-7b and the result was the same

aarnphm commented 1 year ago

It has been a lot of iteration from this. Can you try out with the new API with mistral model to see if you still running into issue? thanks.

Feel free to reopen if you still run into issues.