bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.05k stars 636 forks source link

bug: "No GPU available, therefore this command is disabled" #247

Closed Lukikay closed 1 year ago

Lukikay commented 1 year ago

Describe the bug

No GPU available, therefore this command is disabled

But I think my GPU works well with pytorch 😟 This is the third computer I've tried, and none of them are working 😩

Screenshot 2023-08-22 030700

To reproduce

No response

Logs

>>> openllm start baichuan --model-id baichuan-inc/Baichuan-13B-Chat
No GPU available, therefore this command is disabled

Environment

bentoml: 1.1.1 openllm: 0.2.26 pytorch: 2.0.1+cu118 transformers: 4.31.0 python: 3.8.10 CUDA: 11.8

System information (Optional)

No response

765144989 commented 1 year ago

I also ran into this problem, in chatglm. Did you solve it?

alopez34 commented 1 year ago

Also having this issue. Any feedback would be great! `+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 536.67 Driver Version: 536.67 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 WDDM | 00000000:01:00.0 On | N/A | | 53% 32C P2 49W / 420W | 1668MiB / 24576MiB | 6% Default | | | | N/A |

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0`

panpan0000 commented 1 year ago

same here. both chatglm and baichun. openllm, 0.3.3 (compiled: no)

my comamnd: openllm start baichuan --model-id="/data/models/Baichuan-13B-Chat/" --device CUDA --backend pt

error keyword: only supports running with GPU (None available).

detail error:

023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] An exception occurred while instantiating runner 'llm-baichuan-runner', see details below:
2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
    self._set_handle(LocalRunnerRef)
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 1162, in __init__
    if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
           ^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 749, in model
    raise GpuNotAvailableError(f'{self} only supports running with GPU (None available).') from None
openllm_core.exceptions.GpuNotAvailableError: tag=Tag(name='pt-baichuan-13b-chat', version='ede15aa8c8a46fbde4d2eccb99389d3c00efe9fc') runner_name='llm-baichuan-runner' model_id='/data/models/Baichuan-13B-Chat' config={'max_new_tokens': 2048, 'min_length': 0, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'use_cache': True, 'temperature': 0.95, 'top_k': 50, 'top_p': 0.7, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'diversity_penalty': 0.0, 'repetition_penalty': 1.0, 'encoder_repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'renormalize_logits': False, 'remove_invalid_values': False, 'num_return_sequences': 1, 'output_attentions': False, 'output_hidden_states': False, 'output_scores': False, 'encoder_no_repeat_ngram_size': 0, 'n': 1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'use_beam_search': False, 'ignore_eos': False} adapters_mapping=None backend='pt' only supports running with GPU (None available).

2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/root/anaconda3/lib/python3.11/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
    on_startup()
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
    raise e
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
    self._set_handle(LocalRunnerRef)
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 1162, in __init__
    if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)')
           ^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/openllm/_llm.py", line 749, in model
    raise GpuNotAvailableError(f'{self} only supports running with GPU (None available).') from None
openllm_core.exceptions.GpuNotAvailableError: tag=Tag(name='pt-baichuan-13b-chat', version='ede15aa8c8a46fbde4d2eccb99389d3c00efe9fc') runner_name='llm-baichuan-runner' model_id='/data/models/Baichuan-13B-Chat' config={'max_new_tokens': 2048, 'min_length': 0, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'use_cache': True, 'temperature': 0.95, 'top_k': 50, 'top_p': 0.7, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'diversity_penalty': 0.0, 'repetition_penalty': 1.0, 'encoder_repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'renormalize_logits': False, 'remove_invalid_values': False, 'num_return_sequences': 1, 'output_attentions': False, 'output_hidden_states': False, 'output_scores': False, 'encoder_no_repeat_ngram_size': 0, 'n': 1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'use_beam_search': False, 'ignore_eos': False} adapters_mapping=None backend='pt' only supports running with GPU (None available).

2023-09-07T13:56:07+0800 [ERROR] [runner:llm-baichuan-runner:1] Application startup failed. Exiting.
aarnphm commented 1 year ago

can you send the output of the following?

nvidia-smi

uname -a

pip freeze | grep openllm
arnon-weinberg commented 1 year ago

openllm_core/utils/__init__.py says:

def available_devices() -> tuple[str, ...]:
  '''Return available GPU under system. Currently only supports NVIDIA GPUs.'''

Though it actually just looks for CUDA devices (ie, non-CUDA Nvidia GPUs are not supported either). So the message should be: "No CUDA GPU available, therefore this command is disabled"

>python -c "import torch; print('CUDA is available' if torch.cuda.is_available() else 'CUDA is not available')"
CUDA is not available
aarnphm commented 1 year ago

We are not currently fully rely on torch at the moment. This has to do with torch actually takes a while to import, so some of the functionality are somewhat mirrored from pytorch rather than importing the library.

However, seem like we run into a lot of issue, maybe depending on torch can be a temporary solution as we have plans to address this around Q1~Q2 2024

aarnphm commented 1 year ago

This is now fixed