raise RuntimeError("No GPU found. A GPU is needed for quantization.")

System Info / 系統信息

accelerate 0.33.0 aiofiles 23.2.1 annotated-types 0.7.0 anyio 3.7.1 asyncer 0.0.2 bidict 0.23.1 bitsandbytes 0.42.0 certifi 2024.7.4 chainlit 1.1.400 charset-normalizer 3.3.2 chevron 0.14.0 click 8.1.7 dataclasses-json 0.5.14 Deprecated 1.2.14 distro 1.9.0 einops 0.8.0 exceptiongroup 1.2.2 fastapi 0.110.3 filelock 3.15.4 filetype 1.2.0 fsspec 2024.6.1 gevent 24.2.1 googleapis-common-protos 1.63.2 greenlet 3.0.3 grpcio 1.65.1 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.24.3 idna 3.7 importlib_metadata 8.0.0 Jinja2 3.1.4 Lazify 0.4.0 literalai 0.0.607 loguru 0.7.2 MarkupSafe 2.1.5 marshmallow 3.21.3 mpmath 1.3.0 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 openai 1.37.1 opentelemetry-api 1.26.0 opentelemetry-exporter-otlp 1.26.0 opentelemetry-exporter-otlp-proto-common 1.26.0 opentelemetry-exporter-otlp-proto-grpc 1.26.0 opentelemetry-exporter-otlp-proto-http 1.26.0 opentelemetry-instrumentation 0.47b0 opentelemetry-proto 1.26.0 opentelemetry-sdk 1.26.0 opentelemetry-semantic-conventions 0.47b0 packaging 23.2 pillow 10.4.0 pip 23.3.1 protobuf 4.25.4 psutil 6.0.0 pydantic 2.8.2 pydantic_core 2.20.1 PyJWT 2.8.0 python-dotenv 1.0.1 python-engineio 4.9.1 python-multipart 0.0.9 python-socketio 5.11.3 PyYAML 6.0.1 regex 2024.7.24 requests 2.32.3 safetensors 0.4.3 scipy 1.14.0 setuptools 68.2.2 simple-websocket 1.0.0 sniffio 1.3.1 sse-starlette 2.1.2 starlette 0.37.2 sympy 1.13.1 syncer 2.0.3 timm 1.0.8 tokenizers 0.19.1 tomli 2.0.1 torch 2.4.0 torchvision 0.19.0 tqdm 4.66.4 transformers 4.43.3 triton 3.0.0 typing_extensions 4.12.2 typing-inspect 0.9.0 uptrace 1.26.0 urllib3 2.2.2 uvicorn 0.25.0 watchfiles 0.20.0 websocket 0.2.1 websocket-client 1.7.0 websockets 12.0 wheel 0.41.2 wrapt 1.16.0 wsproto 1.2.0 xformers 0.0.27.post2 zipp 3.19.2 zope.event 5.0 zope.interface 6.2

Who can help? / 谁可以帮助到您？

(temp_test) [root@localhost basic_demo]# python cli_demo.py /root/miniconda3/envs/temp_test/lib/python3.10/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 /root/miniconda3/envs/temp_test/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") /root/miniconda3/envs/temp_test/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. /root/miniconda3/envs/temp_test/lib/python3.10/site-packages/transformers/quantizers/auto.py:174: UserWarning: You passed quantization_config or equivalent parameters to from_pretrained but the model you're loading already has a quantization_config attribute. The quantization_config from the model will be used. warnings.warn(warning_msg) Traceback (most recent call last): File "/root/ai/ask/CogVLM2-main/basic_demo/cli_demo.py", line 37, in model = AutoModelForCausalLM.from_pretrained( File "/root/miniconda3/envs/temp_test/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/root/miniconda3/envs/temp_test/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3354, in from_pretrained hf_quantizer.validate_environment( File "/root/miniconda3/envs/temp_test/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment raise RuntimeError("No GPU found. A GPU is needed for quantization.") RuntimeError: No GPU found. A GPU is needed for quantization. (temp_test) [root@localhost basic_demo]# pip list

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

安装报错，对环境要求这么严格吗，我运行其它llm都没事

Expected behavior / 期待表现

安装报错，对环境要求这么严格吗，我运行其它llm都没事

THUDM / CogVLM2