bug: error coming up while install the vllm using pip install "openllm[vllm]"

Describe the bug

(codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ pip install "openllm[vllm]" Requirement already satisfied: openllm[vllm] in ./miniconda3/envs/codellama/lib/python3.12/site-packages (0.4.44) Requirement already satisfied: accelerate in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.29.3) Requirement already satisfied: bentoml<1.2,>=1.1.11 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from bentoml[io]<1.2,>=1.1.11->openllm[vllm]) (1.1.11) Requirement already satisfied: bitsandbytes<0.42 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.41.3.post2) Requirement already satisfied: build<1 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from build[virtualenv]<1->openllm[vllm]) (0.10.0) Requirement already satisfied: click>=8.1.3 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (8.1.7) Requirement already satisfied: cuda-python in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (12.4.0) Requirement already satisfied: einops in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.7.0) Requirement already satisfied: ghapi in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.0.5) Requirement already satisfied: openllm-client>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44) Requirement already satisfied: openllm-core>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44) Requirement already satisfied: optimum>=1.12.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.19.1) Requirement already satisfied: safetensors in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.3) Requirement already satisfied: scipy in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.13.0) Requirement already satisfied: sentencepiece in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.2.0) Requirement already satisfied: transformers>=4.36.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from transformers[tokenizers,torch]>=4.36.0->openllm[vllm]) (4.40.1) INFO: pip is looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while. Collecting openllm[vllm] Using cached openllm-0.4.43-py3-none-any.whl.metadata (62 kB) Using cached openllm-0.4.42-py3-none-any.whl.metadata (62 kB) Using cached openllm-0.4.41-py3-none-any.whl.metadata (62 kB) Using cached openllm-0.4.40-py3-none-any.whl.metadata (62 kB) Using cached openllm-0.4.39-py3-none-any.whl.metadata (62 kB) Collecting megablocks (from openllm[vllm]) Using cached megablocks-0.5.1.tar.gz (49 kB) Preparing metadata (setup.py) ... done Collecting openllm[vllm] Using cached openllm-0.4.38-py3-none-any.whl.metadata (62 kB) Using cached openllm-0.4.37-py3-none-any.whl.metadata (62 kB) INFO: pip is still looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while. Using cached openllm-0.4.36-py3-none-any.whl.metadata (60 kB) Using cached openllm-0.4.35-py3-none-any.whl.metadata (60 kB) Collecting vllm>=0.2.2 (from openllm[vllm]) Using cached vllm-0.3.3.tar.gz (315 kB) Installing build dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [8 lines of output] Collecting ninja Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB) Collecting packaging Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB) Collecting setuptools>=49.4.0 Using cached setuptools-69.5.1-py3-none-any.whl.metadata (6.2 kB) ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0) ERROR: No matching distribution found for torch==2.1.2 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To reproduce

Step:1 create a normal setup for openllm with conda env Step:2 RUN TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm The following error might me visible to you: (codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 7.36MB/s] tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.59k/1.59k [00:00<00:00, 20.3MB/s] tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 91.4MB/s] tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 60.7MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 5.58MB/s] generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.54MB/s] model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 37.6k/37.6k [00:00<00:00, 116MB/s] pytorch_model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 35.8k/35.8k [00:00<00:00, 207MB/s] model-00007-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.19G/9.19G [00:50<00:00, 180MB/s] model-00001-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.85G/9.85G [00:52<00:00, 188MB/s] model-00002-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s] model-00003-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s] model-00006-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:53<00:00, 180MB/s] model-00005-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s] model-00004-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s] Fetching 15 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:54<00:00, 3.63s/it] ^[[B^[[B^[[B^[[B^[[B^[[B^[[A^[[A^[[A🚀Tip: run 'openllm build codellama/CodeLlama-34b-Instruct-hf --backend vllm --serialization safetensors' to create a BentoLLM for 'codellama/CodeLlama-34b-Instruct-hf's: 100%|████████████████████████████████████████████████████████████████████████████████████████████████▋| 9.66G/9.69G [00:54<00:00, 327MB/s] 2024-04-25T18:34:00+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.9G [00:53<00:00, 285MB/s] 2024-04-25T18:34:01+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)[00:53<00:00, 286MB/s] 2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] An exception occurred while instantiating runner 'llm-llama-runner', see details below: 2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last): File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local self._set_handle(LocalRunnerRef) File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle runner_handle = handle_class(self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init self._runnable = runner.runnable_class(runner.runnable_init_params) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".') openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last): File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/starlette/routing.py", line 732, in lifespan async with self.lifespan_context(app) as maybe_state: File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan on_startup() File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local raise e File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local self._set_handle(LocalRunnerRef) File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle runner_handle = handle_class(self, *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init self._runnable = runner.runnable_class(runner.runnable_init_params) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".') openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Application startup failed. Exiting.

Logs

Mentioned everything above.

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11 python: 3.12.2 platform: Linux-5.15.0-1050-azure-x86_64-with-glibc2.31 uid_gid: 14830125:14830125 conda: 24.3.0 in_conda_env: True

conda_packages

```yaml name: codellama channels: - defaults dependencies: - _libgcc_mutex=0.1=main - _openmp_mutex=5.1=1_gnu - bzip2=1.0.8=h5eee18b_5 - ca-certificates=2024.3.11=h06a4308_0 - expat=2.6.2=h6a678d5_0 - ld_impl_linux-64=2.38=h1181459_1 - libffi=3.4.4=h6a678d5_0 - libgcc-ng=11.2.0=h1234567_1 - libgomp=11.2.0=h1234567_1 - libstdcxx-ng=11.2.0=h1234567_1 - libuuid=1.41.5=h5eee18b_0 - ncurses=6.4=h6a678d5_0 - openssl=3.0.13=h7f8727e_0 - pip=23.3.1=py312h06a4308_0 - python=3.12.2=h996f2a0_0 - readline=8.2=h5eee18b_0 - setuptools=68.2.2=py312h06a4308_0 - sqlite=3.41.2=h5eee18b_0 - tk=8.6.12=h1ccaba5_0 - wheel=0.41.2=py312h06a4308_0 - xz=5.4.6=h5eee18b_0 - zlib=1.2.13=h5eee18b_0 - pip: - accelerate==0.29.3 - aiohttp==3.9.5 - aiosignal==1.3.1 - anyio==4.3.0 - appdirs==1.4.4 - asgiref==3.8.1 - attrs==23.2.0 - bentoml==1.1.11 - bitsandbytes==0.41.3.post2 - build==0.10.0 - cattrs==23.1.2 - certifi==2024.2.2 - charset-normalizer==3.3.2 - circus==0.18.0 - click==8.1.7 - click-option-group==0.5.6 - cloudpickle==3.0.0 - coloredlogs==15.0.1 - contextlib2==21.6.0 - cuda-python==12.4.0 - datasets==2.19.0 - deepmerge==1.1.1 - deprecated==1.2.14 - dill==0.3.8 - distlib==0.3.8 - distro==1.9.0 - einops==0.7.0 - fastcore==1.5.29 - filelock==3.13.4 - filetype==1.2.0 - frozenlist==1.4.1 - fs==2.4.16 - fsspec==2024.3.1 - ghapi==1.0.5 - h11==0.14.0 - httpcore==1.0.5 - httpx==0.27.0 - huggingface-hub==0.22.2 - humanfriendly==10.0 - idna==3.7 - importlib-metadata==6.11.0 - inflection==0.5.1 - jinja2==3.1.3 - markdown-it-py==3.0.0 - markupsafe==2.1.5 - mdurl==0.1.2 - mpmath==1.3.0 - multidict==6.0.5 - multiprocess==0.70.16 - mypy-extensions==1.0.0 - networkx==3.3 - numpy==1.26.4 - nvidia-cublas-cu12==12.1.3.1 - nvidia-cuda-cupti-cu12==12.1.105 - nvidia-cuda-nvrtc-cu12==12.1.105 - nvidia-cuda-runtime-cu12==12.1.105 - nvidia-cudnn-cu12==8.9.2.26 - nvidia-cufft-cu12==11.0.2.54 - nvidia-curand-cu12==10.3.2.106 - nvidia-cusolver-cu12==11.4.5.107 - nvidia-cusparse-cu12==12.1.0.106 - nvidia-ml-py==11.525.150 - nvidia-nccl-cu12==2.20.5 - nvidia-nvjitlink-cu12==12.4.127 - nvidia-nvtx-cu12==12.1.105 - openllm==0.4.44 - openllm-client==0.4.44 - openllm-core==0.4.44 - opentelemetry-api==1.20.0 - opentelemetry-instrumentation==0.41b0 - opentelemetry-instrumentation-aiohttp-client==0.41b0 - opentelemetry-instrumentation-asgi==0.41b0 - opentelemetry-sdk==1.20.0 - opentelemetry-semantic-conventions==0.41b0 - opentelemetry-util-http==0.41b0 - optimum==1.19.1 - orjson==3.10.1 - packaging==24.0 - pandas==2.2.2 - pathspec==0.12.1 - pillow==10.3.0 - pip-requirements-parser==32.0.1 - pip-tools==7.3.0 - platformdirs==4.2.1 - prometheus-client==0.20.0 - protobuf==5.26.1 - psutil==5.9.8 - pyarrow==16.0.0 - pyarrow-hotfix==0.6 - pydantic==1.10.15 - pygments==2.17.2 - pyparsing==3.1.2 - pyproject-hooks==1.0.0 - python-dateutil==2.9.0.post0 - python-json-logger==2.0.7 - python-multipart==0.0.9 - pytz==2024.1 - pyyaml==6.0.1 - pyzmq==26.0.2 - regex==2024.4.16 - requests==2.31.0 - rich==13.7.1 - safetensors==0.4.3 - schema==0.7.5 - scipy==1.13.0 - sentencepiece==0.2.0 - simple-di==0.1.5 - six==1.16.0 - sniffio==1.3.1 - starlette==0.37.2 - sympy==1.12 - tokenizers==0.19.1 - torch==2.3.0 - tornado==6.4 - tqdm==4.66.2 - transformers==4.40.1 - typing-extensions==4.11.0 - tzdata==2024.1 - urllib3==2.2.1 - uvicorn==0.29.0 - virtualenv==20.26.0 - watchfiles==0.21.0 - wrapt==1.16.0 - xxhash==3.4.1 - yarl==1.9.4 - zipp==3.18.1 prefix: /home/amardeep.yadav/miniconda3/envs/codellama ```

pip_packages

``` accelerate==0.29.3 aiohttp==3.9.5 aiosignal==1.3.1 anyio==4.3.0 appdirs==1.4.4 asgiref==3.8.1 attrs==23.2.0 bentoml==1.1.11 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2024.2.2 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.4.0 datasets==2.19.0 deepmerge==1.1.1 Deprecated==1.2.14 dill==0.3.8 distlib==0.3.8 distro==1.9.0 einops==0.7.0 fastcore==1.5.29 filelock==3.13.4 filetype==1.2.0 frozenlist==1.4.1 fs==2.4.16 fsspec==2024.3.1 ghapi==1.0.5 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.22.2 humanfriendly==10.0 idna==3.7 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.3 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 mypy-extensions==1.0.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105 openllm==0.4.44 openllm-client==0.4.44 openllm-core==0.4.44 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.19.1 orjson==3.10.1 packaging==24.0 pandas==2.2.2 pathspec==0.12.1 pillow==10.3.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.2.1 prometheus_client==0.20.0 protobuf==5.26.1 psutil==5.9.8 pyarrow==16.0.0 pyarrow-hotfix==0.6 pydantic==1.10.15 Pygments==2.17.2 pyparsing==3.1.2 pyproject_hooks==1.0.0 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.2 regex==2024.4.16 requests==2.31.0 rich==13.7.1 safetensors==0.4.3 schema==0.7.5 scipy==1.13.0 sentencepiece==0.2.0 setuptools==68.2.2 simple-di==0.1.5 six==1.16.0 sniffio==1.3.1 starlette==0.37.2 sympy==1.12 tokenizers==0.19.1 torch==2.3.0 tornado==6.4 tqdm==4.66.2 transformers==4.40.1 typing_extensions==4.11.0 tzdata==2024.1 urllib3==2.2.1 uvicorn==0.29.0 virtualenv==20.26.0 watchfiles==0.21.0 wheel==0.41.2 wrapt==1.16.0 xxhash==3.4.1 yarl==1.9.4 zipp==3.18.1 ```

System information (Optional)

No response

bentoml / OpenLLM