Closed Lukikay closed 1 year ago
I also ran into this problem, in chatglm. Did you solve it?
Also having this issue. Any feedback would be great! `+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 536.67 Driver Version: 536.67 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 WDDM | 00000000:01:00.0 On | N/A | | 53% 32C P2 49W / 420W | 1668MiB / 24576MiB | 6% Default | | | | N/A |
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0`
same here. both chatglm and baichun.
openllm, 0.3.3 (compiled: no)
my comamnd:
openllm start baichuan --model-id="/data/models/Baichuan-13B-Chat/" --device CUDA --backend pt
error keyword: only supports running with GPU (None available).
detail error:
023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] An exception occurred while instantiating runner 'llm-baichuan-runner', see details below:
2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 1162, in __init__
if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 749, in model
raise GpuNotAvailableError(f'{self} only supports running with GPU (None available).') from None
openllm_core.exceptions.GpuNotAvailableError: tag=Tag(name='pt-baichuan-13b-chat', version='ede15aa8c8a46fbde4d2eccb99389d3c00efe9fc') runner_name='llm-baichuan-runner' model_id='/data/models/Baichuan-13B-Chat' config={'max_new_tokens': 2048, 'min_length': 0, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'use_cache': True, 'temperature': 0.95, 'top_k': 50, 'top_p': 0.7, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'diversity_penalty': 0.0, 'repetition_penalty': 1.0, 'encoder_repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'renormalize_logits': False, 'remove_invalid_values': False, 'num_return_sequences': 1, 'output_attentions': False, 'output_hidden_states': False, 'output_scores': False, 'encoder_no_repeat_ngram_size': 0, 'n': 1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'use_beam_search': False, 'ignore_eos': False} adapters_mapping=None backend='pt' only supports running with GPU (None available).
2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/root/anaconda3/lib/python3.11/contextlib.py", line 204, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
on_startup()
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
raise e
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 1162, in __init__
if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 749, in model
raise GpuNotAvailableError(f'{self} only supports running with GPU (None available).') from None
openllm_core.exceptions.GpuNotAvailableError: tag=Tag(name='pt-baichuan-13b-chat', version='ede15aa8c8a46fbde4d2eccb99389d3c00efe9fc') runner_name='llm-baichuan-runner' model_id='/data/models/Baichuan-13B-Chat' config={'max_new_tokens': 2048, 'min_length': 0, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'use_cache': True, 'temperature': 0.95, 'top_k': 50, 'top_p': 0.7, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'diversity_penalty': 0.0, 'repetition_penalty': 1.0, 'encoder_repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'renormalize_logits': False, 'remove_invalid_values': False, 'num_return_sequences': 1, 'output_attentions': False, 'output_hidden_states': False, 'output_scores': False, 'encoder_no_repeat_ngram_size': 0, 'n': 1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'use_beam_search': False, 'ignore_eos': False} adapters_mapping=None backend='pt' only supports running with GPU (None available).
2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Application startup failed. Exiting.
can you send the output of the following?
nvidia-smi
uname -a
pip freeze | grep openllm
openllm_core/utils/__init__.py
says:
def available_devices() -> tuple[str, ...]:
'''Return available GPU under system. Currently only supports NVIDIA GPUs.'''
Though it actually just looks for CUDA devices (ie, non-CUDA Nvidia GPUs are not supported either). So the message should be: "No CUDA GPU available, therefore this command is disabled"
>python -c "import torch; print('CUDA is available' if torch.cuda.is_available() else 'CUDA is not available')"
CUDA is not available
We are not currently fully rely on torch at the moment. This has to do with torch actually takes a while to import, so some of the functionality are somewhat mirrored from pytorch rather than importing the library.
However, seem like we run into a lot of issue, maybe depending on torch can be a temporary solution as we have plans to address this around Q1~Q2 2024
This is now fixed
Describe the bug
But I think my GPU works well with pytorch 😟 This is the third computer I've tried, and none of them are working 😩
To reproduce
No response
Logs
Environment
bentoml: 1.1.1 openllm: 0.2.26 pytorch: 2.0.1+cu118 transformers: 4.31.0 python: 3.8.10 CUDA: 11.8
System information (Optional)
No response