bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.24k stars 626 forks source link

NVIDIA TX2+jetpack 5+Ubuntu20.4: CUDA Setup failed despite GPU being available #1154

Closed qxpBlog closed 7 months ago

qxpBlog commented 7 months ago

System Info

cuda 11.4 _openmp_mutex 4.5 2_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge accelerate 0.28.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 altair 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.3.0 arrow 1.3.0 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.42.0 bzip2 1.0.8 hf897c2e_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ca-certificates 2022.9.24 h4fd8a4c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge certifi 2024.2.2 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 codecarbon 2.3.4 colorama 0.4.6 cycler 0.12.1 DataProperty 1.0.1 datasets 2.18.0 dill 0.3.8 docker-pycreds 0.4.0 exceptiongroup 1.2.0 fastapi 0.110.0 ffmpy 0.3.2 filelock 3.13.1 frozenlist 1.4.1 fsspec 2024.2.0 gitdb 4.0.11 GitPython 3.1.42 gradio 4.23.0 gradio_client 0.14.0 h11 0.14.0 httpcore 1.0.4 httpx 0.27.0 huggingface-hub 0.21.4 idna 3.6 importlib-resources 5.13.0 Jinja2 3.1.3 joblib 1.3.2 jsonlines 4.0.0 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 ld_impl_linux-aarch64 2.39 ha75b1e8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libffi 3.4.2 h3557bc0_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgcc-ng 12.2.0 h607ecd0_19 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgomp 12.2.0 h607ecd0_19 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libnsl 2.0.0 hf897c2e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libuuid 2.32.1 hf897c2e_1000 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libzlib 1.2.13 h4e544f5_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge markdown-it-py 3.0.0 mbstrdecoder 1.1.3 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 ncurses 6.4 h419075a_0 networkx 3.1 nltk 3.8.1 numexpr 2.8.6 numpy 1.24.4 omegaconf 2.3.0 openssl 3.0.7 h4e544f5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge orjson 3.9.15 packaging 24.0 pandas 2.0.3 pathvalidate 3.2.0 peft 0.10.0 pillow 10.2.0 pip 22.3.1 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pkgutil_resolve_name 1.3.10 portalocker 2.8.2 prometheus_client 0.20.0 psutil 5.9.8 ptflops 0.7.2.2 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pycountry 23.12.11 pydantic 2.6.4 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 pynvml 11.5.0 pyparsing 3.1.2 pytablewriter 1.2.0 python 3.8.13 h92ab765_0_cpython https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge python-dateutil 2.9.0.post0 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 rapidfuzz 3.7.0 readline 8.1.2 h38e3740_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge referencing 0.34.0 regex 2023.12.25 requests 2.31.0 rich 13.7.1 rouge-score 0.1.2 rpds-py 0.18.0 ruff 0.3.4 sacrebleu 1.5.0 safetensors 0.4.2 scikit-learn 1.3.2 semantic-version 2.10.0 sentencepiece 0.2.0 sentry-sdk 1.43.0 setproctitle 1.3.3 setuptools 65.5.1 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge shellingham 1.5.4 six 1.16.0 smmap 5.0.1 smoothquant 0.0.1.dev0 sniffio 1.3.1 sqlite 3.41.2 h998d150_0 sqlitedict 2.1.0 starlette 0.36.3 sympy 1.12 tabledata 1.3.3 tcolorpy 0.1.4 threadpoolctl 3.4.0 tiktoken 0.6.0 tk 8.6.12 hd8af866_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge tokenizers 0.15.2 tomlkit 0.12.0 toolz 0.12.1 torch 2.0.0+nv23.5 transformers 4.38.2 typepy 1.3.2 typer 0.10.0 types-python-dateutil 2.9.0.20240316 typing_extensions 4.10.0 tzdata 2024.1 tzdata 2022f h191b570_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge urllib3 2.2.1 uvicorn 0.29.0 wandb 0.16.5 websockets 11.0.3 wheel 0.38.4 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xxhash 3.4.1 xz 5.4.6 h998d150_0 yarl 1.9.4 zlib 1.2.13 h4e544f5_4 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge

Reproduction

When i run the following code:

import os
import sys
import argparse
import accelerate
from accelerate.utils import BnbQuantizationConfig
import torch
import numpy as np
import time
import transformers 
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer,AutoModel,AutoTokenizer,AutoModelForCausalLM,GPTQConfig
from codecarbon import track_emissions,EmissionsTracker
from LLMPruner.utils.logger import LoggerWithDepth
from transformers.models.opt.modeling_opt import OPTAttention, OPTDecoderLayer, OPTForCausalLM
from ptflops import get_model_complexity_info
from ptflops.pytorch_ops import bn_flops_counter_hook, pool_flops_counter_hook
from LLMPruner.evaluator.ppl import PPLMetric,test_latency_energy
from LLMPruner.models.hf_llama.modeling_llama import LlamaForCausalLM, LlamaRMSNorm, LlamaAttention, LlamaMLP
from LLMPruner.peft import PeftModel
if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"
torch_version = int(torch.__version__.split('.')[1])

def LlamaAttention_counter_hook(module, input, output):
    # (1) Ignore past-key values
    # (2) Assume there is no attention mask
    # Input will be empty in some pytorch version. use output here since input.shape == output.shape
    flops = 0
    q_len = output[0].shape[1]
    linear_dim = output[0].shape[-1]
    num_heads = module.num_heads
    head_dim = module.head_dim

    rotary_flops = 2 * (q_len * num_heads * head_dim) * 2
    attention_flops = num_heads * (q_len * q_len * head_dim + q_len * q_len + q_len * q_len * head_dim) #QK^T + softmax + AttentionV
    linear_flops = 4 * (q_len * linear_dim * num_heads * head_dim) # 4 for q, k, v, o. 
    flops += rotary_flops + attention_flops + linear_flops
    module.__flops__ += int(flops)

def rmsnorm_flops_counter_hook(module, input, output):
    input = input[0]

    batch_flops = np.prod(input.shape)
    batch_flops *= 2
    module.__flops__ += int(batch_flops)

# @track_emissions()
def main(args):

    if args.test_mod == 'tuned':
        # 微调过后的模型的延迟和功耗的评估
        pruned_dict = torch.load(args.ckpt, map_location='cpu')
        tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model']
        model = PeftModel.from_pretrained(
            model,
            args.lora_ckpt,
            torch_dtype=torch.float16,
        )
    elif args.test_mod == 'pruned':
        # 剪枝过后的模型的延迟和功耗的评估
        pruned_dict = torch.load(args.ckpt, map_location='cpu')
        tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model']
    elif args.test_mod == 'base':
        model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B", torch_dtype="auto", trust_remote_code=True)
        tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B", trust_remote_code=True)

    model.to(device)      
    # torch.save({
    #     'model': model, 
    #     'tokenizer': tokenizer,
    # }, "/home/iotsc01/xinpengq/LLM-Pruner-main/prune_log/quant/pytorch_model.bin")    

    print(model.device)
    # model.config.pad_token_id = tokenizer.pad_token_id = 0 
    # model.config.bos_token_id = 1
    # model.config.eos_token_id = 2

    model.eval()

    after_pruning_parameters = sum(p.numel() for p in model.parameters())
    print("#parameters: {}".format(after_pruning_parameters))

    ppl = test_latency_energy(model, tokenizer, ['wikitext2', 'ptb'], args.max_seq_len, device=device)
    print("PPL after pruning: {}".format(ppl))
    print("Memory Requirement: {} MiB\n".format(torch.cuda.memory_allocated() / 1024 / 1024))

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Tuning Pruned LLaMA (huggingface version)')

    parser.add_argument('--base_model', type=str, default="llama-7b-hf", help='base model name')
    parser.add_argument('--ckpt', type=str, default=None)
    parser.add_argument('--lora_ckpt', type=str, default=None)
    parser.add_argument('--max_seq_len', type=int, default=128, help='max sequence length')
    parser.add_argument('--test_mod', type=str, default="tuned", help='choose from [pruned, tuned, base]')
    args = parser.parse_args()

    main(args)

I set the attribution test_mod to base,but the following issues occurred:

/home/jetson/.local/lib/python3.8/site-packages/torchvision-0.13.0-py3.8-linux-aarch64.egg/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /home/jetson/.local/lib/python3.8/site-packages/torchvision-0.13.0-py3.8-linux-aarch64.egg/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
  warn(f"Failed to load image Python extension: {e}")
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

  warn(msg)
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: /home/jetson/archiconda3/envs/llm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda-11.4/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda-11.4/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: /opt/ros/noetic/lib:/usr/local/cuda-11.4/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
False

===================================BUG REPORT===================================
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//hf-mirror.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('//localhost'), PosixPath('http'), PosixPath('11311')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=114, Highest Compute Capability: 8.7.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary /home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda114.so...
/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda114.so: cannot open shared object file: No such file or directory
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=114 make cuda11x
python setup.py install
Traceback (most recent call last):
  File "/home/jetson/llm-mian/LLM-Pruner-main/test_latency_energy.py", line 18, in <module>
    from LLMPruner.peft import PeftModel
  File "/home/jetson/llm-mian/LLM-Pruner-main/LLMPruner/peft/__init__.py", line 22, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
  File "/home/jetson/llm-mian/LLM-Pruner-main/LLMPruner/peft/mapping.py", line 16, in <module>
    from .peft_model import (
  File "/home/jetson/llm-mian/LLM-Pruner-main/LLMPruner/peft/peft_model.py", line 31, in <module>
    from .tuners import AdaLoraModel, LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
  File "/home/jetson/llm-mian/LLM-Pruner-main/LLMPruner/peft/tuners/__init__.py", line 20, in <module>
    from .lora import LoraConfig, LoraModel
  File "/home/jetson/llm-mian/LLM-Pruner-main/LLMPruner/peft/tuners/lora.py", line 40, in <module>
    import bitsandbytes as bnb
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/jetson/archiconda3/envs/llm/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Expected behavior

@kashif @stephenroller @akx @jbn @I want to konw how to solve this problem, and if bitsandbytes does not support the TX2.Looking forward to your reply.

matthewdouglas commented 7 months ago

Duplicate of #1151. There has not been a bitsandbytes release built for aarch64 yet.

qxpBlog commented 7 months ago

thanks