bug: circus.exc.ConflictError, Cross Device error, and Internal Server Error

Describe the bug

After updating OpenLLM from 0.1.20 to 0.2.0, I tried to load Baichuan-13B-Chat model as following:

openllm start baichuan --model-id /home/user/.cache/modelscope/hub/baichuan-inc/Baichuan-13B-Chat/ --device 0

Then several problems occurred:

circus.exc.ConflictError:

openllm start baichuan --model-id /home/user/.cache/modelscope/hub/baichuan-inc/Baichuan-13B-Chat/ --device 0
Make sure to have the following dependencies available: ['cpm-kernels']
Converting '/home/user/.cache/modelscope/hub/baichuan-inc/Baichuan-13B-Chat/' to lowercase: '/home/user/.cache/modelscope/hub/baichuan-inc/baichuan-13b-chat/'.
Converting '/home/user/.cache/modelscope/hub/baichuan-inc/Baichuan-13B-Chat/' to lowercase: '/home/user/.cache/modelscope/hub/baichuan-inc/baichuan-13b-chat/'.
Converting 'pt-Baichuan-13B-Chat' to lowercase: 'pt-baichuan-13b-chat'.
Converting 'pt-Baichuan-13B-Chat' to lowercase: 'pt-baichuan-13b-chat'.
__tag__:pt-baichuan-13b-chat:10e955477599362428d4e089e8ad6138256c784f
2023-07-20T14:19:28+0800 [ERROR] [cli] Exception in callback <bound method Arbiter.manage_watchers of <circus.arbiter.Arbiter object at 0x7f9de09c8e80>>
Traceback (most recent call last):
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/tornado/ioloop.py", line 919, in _run
val = self.callback()
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/circus/util.py", line 1038, in wrapper
raise ConflictError("arbiter is already running %s command"
circus.exc.ConflictError: arbiter is already running arbiter_start_watchers command

Detected input and model not in the same device:

2023-07-20T14:24:42+0800 [ERROR] [runner:llm-baichuan-runner:1] Exception in ASGI application
Traceback (most recent call last):
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app(  # type: ignore[func-returns-value]
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
await self.app(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 580, in __call__
await self.app(scope, otel_receive, otel_send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/server/http/instruments.py", line 252, in __call__
await self.app(scope, receive, wrapped_send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app
raise exc
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app
await app(scope, receive, sender)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/routing.py", line 727, in __call__
await route.handle(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/routing.py", line 285, in handle
await self.app(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app
raise exc
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app
await app(scope, receive, sender)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/starlette/routing.py", line 69, in app
response = await func(request)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/server/runner_app.py", line 273, in _request_handler
payload = await infer(params)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/marshal/dispatcher.py", line 182, in _func
raise r
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/marshal/dispatcher.py", line 377, in outbound_call
outputs = await self.callback(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/server/runner_app.py", line 253, in infer_single
ret = await runner_method.async_run(*params.args, **params.kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 59, in async_run_method
return await anyio.to_thread.run_sync(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/runner/runnable.py", line 140, in method
return self.func(obj, *args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/openllm/_llm.py", line 1429, in generate
return self.generate(prompt, **attrs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/openllm/models/baichuan/modeling_baichuan.py", line 82, in generate
outputs = self.model.generate(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/transformers/generation/utils.py", line 1538, in generate
return self.greedy_search(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
outputs = self(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/10e955477599362428d4e089e8ad6138256c784f/modeling_baichuan.py", line 400, in forward
outputs = self.model(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/10e955477599362428d4e089e8ad6138256c784f/modeling_baichuan.py", line 284, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Internal Server Error:

2023-07-20T14:24:42+0800 [ERROR] [api_server:llm-baichuan-service:77] Exception on /v1/generate [POST] (trace=bfb570799cdf3681882093b96cd13353,span=fd186b183a773696,sampled=1,service.name=llm-baichuan-service)
Traceback (most recent call last):
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/openllm/_service.py", line 88, in generate_v1
responses = await runner.generate.async_run(qa_inputs.prompt, **config)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 242, in async_run_method
raise RemoteException(
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner llm-baichuan-runner: [500] Internal Server Error

Run nvidia-smi, it seems that model has not been loaded on the target device yet:


Thu Jul 20 14:41:50 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84       Driver Version: 460.84       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-PCIE-40GB      Off  | 00000000:18:00.0 Off |                    0 |
| N/A   24C    P0    32W / 250W |    848MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-PCIE-40GB      Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   24C    P0    30W / 250W |      3MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  A100-PCIE-40GB      Off  | 00000000:86:00.0 Off |                    0 |
| N/A   25C    P0    33W / 250W |      3MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  A100-PCIE-40GB      Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   25C    P0    31W / 250W |      3MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1003374 C ...s/openllm_test/bin/python 845MiB | +-----------------------------------------------------------------------------+


Here is my environment (CUDA VERSION 11.7):

bentoml env

Environment variable

BENTOML_DEBUG='' BENTOML_QUIET='' BENTOML_BUNDLE_LOCAL_BUILD='' BENTOML_DO_NOT_TRACK='' BENTOML_CONFIG='' BENTOML_CONFIG_OPTIONS='' BENTOML_PORT='' BENTOML_HOST='' BENTOML_API_WORKERS=''

System information

bentoml: 1.0.24 python: 3.9.17 platform: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.28 uid_gid: 1001:1001 /home/user/Downloads/enter/bin/conda: 4.5.11 in_conda_env: True

conda_packages

``` Request: ``` curl -X 'POST' 'http://localhost:3000/v1/generate' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "prompt": "hello", "llm_config": { "max_new_tokens": 2048, "min_length": 0, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "use_cache": true, "temperature": 0.95, "top_k": 50, "top_p": 0.7, "typical_p": 1, "epsilon_cutoff": 0, "eta_cutoff": 0, "diversity_penalty": 0, "repetition_penalty": 1, "encoder_repetition_penalty": 1, "length_penalty": 1, "no_repeat_ngram_size": 0, "renormalize_logits": false, "remove_invalid_values": false, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "encoder_no_repeat_ngram_size": 0, "n": 1, "presence_penalty": 0, "frequency_penalty": 0, "use_beam_search": false, "ignore_eos": false } }' ``` ### To reproduce 1. Run `openllm start baichuan --model-id /path/to/baichuan-inc/Baichuan-13B-Chat/ --device 0` 2. Run ``` curl -X 'POST' 'http://localhost:3000/v1/generate' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "prompt": "hello", "llm_config": { "max_new_tokens": 2048, "min_length": 0, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "use_cache": true, "temperature": 0.95, "top_k": 50, "top_p": 0.7, "typical_p": 1, "epsilon_cutoff": 0, "eta_cutoff": 0, "diversity_penalty": 0, "repetition_penalty": 1, "encoder_repetition_penalty": 1, "length_penalty": 1, "no_repeat_ngram_size": 0, "renormalize_logits": false, "remove_invalid_values": false, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "encoder_no_repeat_ngram_size": 0, "n": 1, "presence_penalty": 0, "frequency_penalty": 0, "use_beam_search": false, "ignore_eos": false } }' ``` ### Logs _No response_ ### Environment `transformers-cli env` ``` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ /home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/user/Downloads/enter/envs/openllm_test did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/user/Downloads/enter/envs/openllm_test/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points. - `transformers` version: 4.31.0 - Platform: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.28 - Python version: 3.9.17 - Huggingface_hub version: 0.16.4 - Safetensors version: 0.3.1 - Accelerate version: 0.21.0 - Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: DEEPSPEED - mixed_precision: fp16 - use_cpu: False - num_processes: 4 - machine_rank: 0 - num_machines: 1 - rdzv_backend: static - same_network: True - main_training_function: main - deepspeed_config: {'gradient_accumulation_steps': 4, 'gradient_clipping': 1.0, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2} - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: [] - dynamo_config: {'dynamo_backend': 'INDUCTOR', 'dynamo_mode': 'default', 'dynamo_use_dynamic': True, 'dynamo_use_fullgraph': False} - PyTorch version (GPU?): 2.0.1+cu117 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: - Using distributed or parallel set-up in script?: ``` `bentoml env` #### Environment variable bash BENTOML_DEBUG='' BENTOML_QUIET='' BENTOML_BUNDLE_LOCAL_BUILD='' BENTOML_DO_NOT_TRACK='' BENTOML_CONFIG='' BENTOML_CONFIG_OPTIONS='' BENTOML_PORT='' BENTOML_HOST='' BENTOML_API_WORKERS='' #### System information `bentoml`: 1.0.24 `python`: 3.9.17 `platform`: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.28 `uid_gid`: 1001:1001 `/home/user/Downloads/enter/bin/conda`: 4.5.11 `in_conda_env`: True

conda_packages

```yaml name: openllm_test channels: - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - defaults dependencies: - _libgcc_mutex=0.1=conda_forge - _openmp_mutex=4.5=2_kmp_llvm - libgcc-ng=12.2.0=h65d4601_19 - libstdcxx-ng=12.2.0=h46fd767_19 - ca-certificates=2023.05.30=h06a4308_0 - ld_impl_linux-64=2.38=h1181459_1 - libffi=3.4.4=h6a678d5_0 - llvm-openmp=14.0.6=h9e868ea_0 - ncurses=6.4=h6a678d5_0 - openssl=3.0.9=h7f8727e_0 - pip=23.1.2=py39h06a4308_0 - python=3.9.17=h955ad1f_0 - readline=8.2=h5eee18b_0 - setuptools=67.8.0=py39h06a4308_0 - sqlite=3.41.2=h5eee18b_0 - tk=8.6.12=h1ccaba5_0 - tzdata=2023c=h04d1e81_0 - wheel=0.38.4=py39h06a4308_0 - xz=5.4.2=h5eee18b_0 - zlib=1.2.13=h5eee18b_0 - pip: - accelerate==0.21.0 - aiohttp==3.8.5 - aiosignal==1.3.1 - anyio==3.7.1 - appdirs==1.4.4 - asgiref==3.7.2 - async-timeout==4.0.2 - attrs==23.1.0 - bentoml==1.0.24 - bitsandbytes==0.39.1 - build==0.10.0 - cattrs==23.1.2 - certifi==2023.5.7 - charset-normalizer==3.2.0 - circus==0.18.0 - click==8.1.6 - click-option-group==0.5.6 - cloudpickle==2.2.1 - cmake==3.27.0 - coloredlogs==15.0.1 - contextlib2==21.6.0 - cpm-kernels==1.0.11 - cuda-python==12.2.0 - cython==3.0.0 - datasets==2.13.1 - deepmerge==1.1.0 - deprecated==1.2.14 - dill==0.3.6 - exceptiongroup==1.1.2 - filelock==3.12.2 - filetype==1.2.0 - frozenlist==1.4.0 - fs==2.4.16 - fsspec==2023.6.0 - grpcio==1.56.2 - grpcio-health-checking==1.56.2 - h11==0.14.0 - httpcore==0.17.3 - httpx==0.24.1 - huggingface-hub==0.16.4 - humanfriendly==10.0 - idna==3.4 - importlib-metadata==6.0.1 - inflection==0.5.1 - jinja2==3.1.2 - lit==16.0.6 - markdown-it-py==3.0.0 - markupsafe==2.1.3 - mdurl==0.1.2 - mpmath==1.3.0 - multidict==6.0.4 - multiprocess==0.70.14 - networkx==3.1 - numpy==1.25.1 - nvidia-cublas-cu11==11.10.3.66 - nvidia-cuda-cupti-cu11==11.7.101 - nvidia-cuda-nvrtc-cu11==11.7.99 - nvidia-cuda-runtime-cu11==11.7.99 - nvidia-cudnn-cu11==8.5.0.96 - nvidia-cufft-cu11==10.9.0.58 - nvidia-curand-cu11==10.2.10.91 - nvidia-cusolver-cu11==11.4.0.1 - nvidia-cusparse-cu11==11.7.4.91 - nvidia-nccl-cu11==2.14.3 - nvidia-nvtx-cu11==11.7.91 - openllm==0.2.0 - opentelemetry-api==1.18.0 - opentelemetry-instrumentation==0.39b0 - opentelemetry-instrumentation-aiohttp-client==0.39b0 - opentelemetry-instrumentation-asgi==0.39b0 - opentelemetry-instrumentation-grpc==0.39b0 - opentelemetry-sdk==1.18.0 - opentelemetry-semantic-conventions==0.39b0 - opentelemetry-util-http==0.39b0 - optimum==1.9.1 - orjson==3.9.2 - packaging==23.1 - pandas==2.0.3 - pathspec==0.11.1 - pillow==10.0.0 - pip-requirements-parser==32.0.1 - pip-tools==7.1.0 - prometheus-client==0.17.1 - protobuf==4.23.4 - psutil==5.9.5 - pyarrow==12.0.1 - pydantic==1.10.11 - pygments==2.15.1 - pynvml==11.5.0 - pyparsing==3.1.0 - pyproject_hooks==1.0.0 - python-dateutil==2.8.2 - python-json-logger==2.0.7 - python-multipart==0.0.6 - pytz==2023.3 - pyyaml==6.0.1 - pyzmq==25.1.0 - regex==2023.6.3 - requests==2.31.0 - rich==13.4.2 - safetensors==0.3.1 - schema==0.7.5 - scipy==1.11.1 - sentencepiece==0.1.99 - simple-di==0.1.5 - six==1.16.0 - sniffio==1.3.0 - starlette==0.28.0 - sympy==1.12 - tabulate==0.9.0 - tokenizers==0.13.3 - tomli==2.0.1 - torch==2.0.1 - tornado==6.3.2 - tqdm==4.65.0 - transformers==4.31.0 - transformers-stream-generator==0.0.4 - triton==2.0.0 - typing_extensions==4.7.1 - urllib3==2.0.4 - uvicorn==0.23.1 - watchfiles==0.19.0 - wcwidth==0.2.6 - wrapt==1.15.0 - xxhash==3.2.0 - yarl==1.9.2 - zipp==3.16.2 prefix: /home/user/Downloads/enter/envs/openllm_test

pip_packages

accelerate==0.21.0 aiohttp==3.8.5 aiosignal==1.3.1 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.2 attrs==23.1.0 bentoml==1.0.24 bitsandbytes==0.39.1 build==0.10.0 cattrs==23.1.2 certifi==2023.5.7 charset-normalizer==3.2.0 circus==0.18.0 click==8.1.6 click-option-group==0.5.6 cloudpickle==2.2.1 cmake==3.27.0 coloredlogs==15.0.1 contextlib2==21.6.0 cpm-kernels==1.0.11 cuda-python==12.2.0 Cython==3.0.0 datasets==2.13.1 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.6 exceptiongroup==1.1.2 filelock==3.12.2 filetype==1.2.0 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.6.0 grpcio==1.56.2 grpcio-health-checking==1.56.2 h11==0.14.0 httpcore==0.17.3 httpx==0.24.1 huggingface-hub==0.16.4 humanfriendly==10.0 idna==3.4 importlib-metadata==6.0.1 inflection==0.5.1 Jinja2==3.1.2 lit==16.0.6 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.14 networkx==3.1 numpy==1.25.1 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 openllm==0.2.0 opentelemetry-api==1.18.0 opentelemetry-instrumentation==0.39b0 opentelemetry-instrumentation-aiohttp-client==0.39b0 opentelemetry-instrumentation-asgi==0.39b0 opentelemetry-instrumentation-grpc==0.39b0 opentelemetry-sdk==1.18.0 opentelemetry-semantic-conventions==0.39b0 opentelemetry-util-http==0.39b0 optimum==1.9.1 orjson==3.9.2 packaging==23.1 pandas==2.0.3 pathspec==0.11.1 Pillow==10.0.0 pip-requirements-parser==32.0.1 pip-tools==7.1.0 prometheus-client==0.17.1 protobuf==4.23.4 psutil==5.9.5 pyarrow==12.0.1 pydantic==1.10.11 Pygments==2.15.1 pynvml==11.5.0 pyparsing==3.1.0 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3 PyYAML==6.0.1 pyzmq==25.1.0 regex==2023.6.3 requests==2.31.0 rich==13.4.2 safetensors==0.3.1 schema==0.7.5 scipy==1.11.1 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.28.0 sympy==1.12 tabulate==0.9.0 tokenizers==0.13.3 tomli==2.0.1 torch==2.0.1 tornado==6.3.2 tqdm==4.65.0 transformers==4.31.0 transformers-stream-generator==0.0.4 triton==2.0.0 typing_extensions==4.7.1 tzdata==2023.3 urllib3==2.0.4 uvicorn==0.23.1 watchfiles==0.19.0 wcwidth==0.2.6 wrapt==1.15.0 xxhash==3.2.0 yarl==1.9.2 zipp==3.16.2

``` ### System information (Optional) _No response_

bentoml / OpenLLM