verymiao commented 11 months ago

硬件：GTX 3080 * 2 环境：nvidia-docker, Python 3.10.12, cuda 11.8

examples/chat_gradio.py 添加 os.environ["CUDA_VISIBLE_DEVICES"] = "0" 正常运行，可以回答问题； os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" 运行正常，回答问题报错如下：

root@dfa41c8cd008:~/Llama2-Chinese# python examples/chat_gradio.py --model_name_or_path /root/download/Atom-7B-Chat/
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

issues

bin /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... Loading checkpoint shards: 100%|██████████████████████████████████████████| 2/2 [00:08<00:00, 4.35s/it] ------model.eval------ Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Human: 你好 Assistant: ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. ----重复内容(略)---- -sizes[i] <= index && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2,0,0], thread: [31,0,0] Assertion-sizes[i] <= index && index < sizes[i] && "index out of bounds"failed. Exception in thread Thread-8 (generate): Traceback (most recent call last): File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/local/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1719, in generate return self.sample( File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2801, in sample outputs = self( File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward outputs = self.model( File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 922, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 672, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 424, in forward attn_output = self.o_proj(attn_output) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 388, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 559, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 323, in forward CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold) File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2005, in double_quant nnz = nnz_row_ptr[-1].item() RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.

Xiaozl11 commented 11 months ago

我也和你的问题刚好相反，我一个4090的显卡运行出这样的错误。以下是报错信息： RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

verymiao commented 11 months ago

我也和你的问题刚好相反，我一个4090的显卡运行出这样的错误。以下是报错信息： RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

显存不够用，爆显存了。

你跑的模型是多大的，不要超过你的显存；
上面是不是运行了其他程序，显存被占用了；

Xiaozl11 commented 11 months ago

我也和你的问题刚好相反，我一个4090的显卡运行出这样的错误。以下是报错信息： RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

显存不够用，爆显存了。

你跑的模型是多大的，不要超过你的显存；

上面是不是运行了其他程序，显存被占用了；

1.用的FlagAlpha/Llama2-Chinese-7b-Chat这个模型 2.我把其他的程序都关了，大概系统和docker用了500M左右,但是我的显卡是24G的，就很奇怪

verymiao commented 11 months ago

我也和你的问题刚好相反，我一个4090的显卡运行出这样的错误。以下是报错信息： RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

显存不够用，爆显存了。

你跑的模型是多大的，不要超过你的显存；

上面是不是运行了其他程序，显存被占用了；

1.用的FlagAlpha/Llama2-Chinese-7b-Chat这个模型 2.我把其他的程序都关了，大概系统和docker用了500M左右,但是我的显卡是24G的，就很奇怪

你在本机上面跑的，还是nvidia docker? nvidia docker 需要 --gpus 参数，负责程序无法调用gpu。你试试直接在 python 命令行中运行下面的命令

import torch
torch.cuda.is_available()     # 正常会返回True

chopin1998 commented 9 months ago

硬件：GTX 3080 * 2 环境：nvidia-docker, Python 3.10.12, cuda 11.8

examples/chat_gradio.py 添加 os.environ["CUDA_VISIBLE_DEVICES"] = "0" 正常运行，可以回答问题； os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" 运行正常，回答问题报错如下：

root@dfa41c8cd008:~/Llama2-Chinese# python examples/chat_gradio.py --model_name_or_path /root/download/Atom-7B-Chat/

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Loading checkpoint shards: 100%|██████████████████████████████████████████| 2/2 [00:08<00:00,  4.35s/it]
------model.eval------
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Human: 你好
Assistant: ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
----重复内容(略)----
-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Exception in thread Thread-8 (generate):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
  File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2801, in sample
    outputs = self(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 672, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 424, in forward
    attn_output = self.o_proj(attn_output)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 388, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 559, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 323, in forward
    CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2005, in double_quant
    nnz = nnz_row_ptr[-1].item()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

一样的问题，多卡3090, 请问你解决了吗

LlamaFamily / Llama-Chinese

单卡运行正常,双卡报错 `device-side assert triggered` #262

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues