THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.08k stars 415 forks source link

cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #24

Closed aka99 closed 3 months ago

aka99 commented 1 year ago

[2023-05-19 14:50:31,777] [INFO] [RANK 0] > successfully loaded /home/tony/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt 欢迎使用 VisualGLM-6B 模型,输入图像URL或本地路径读图,继续输入内容对话,clear 重新开始,stop 终止程序 请输入图像路径或URL(回车进入纯文本对话): https://img.caixin.com/2023-05-13/168394947268597_480_320.jpg cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

$ nvidia-smi Fri May 19 14:52:34 2023
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 L... On | 00000000:01:00.0 Off | N/A | | N/A 42C P8 7W / 150W| 1MiB / 16376MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found |

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Sleepychord commented 1 year ago

这个错误可能原因很多,请确认使用的是合适的pytorch版本,例如1.13.1。其他的pytorch程序可以正常运行吗?

aka99 commented 1 year ago

尝试了Pytorch多种安装,包括在conda、python 环境下,一直提示同样的错误 方式1: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidi

方式2: pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

LuneZ99 commented 1 year ago

补充:使用 torch==2.0.0 和 readme 中 transformers 的调用方法,在首次运行时正常,但结束程序后再次运行就出现了该错误,原因不太确定。

LuneZ99 commented 1 year ago

补充:使用 torch==2.0.0 和 readme 中 transformers 的调用方法,在首次运行时正常,但结束程序后再次运行就出现了该错误,原因不太确定。

以上情况的原因: 在原本能够运行的环境中(未安装 TensorRT)执行推理,会有一行警告并忽略 TensorRT: 2023-05-21 16:42:41.921443: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

而安装了 TensorRT 后则会产生报错 cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

推测是存在不兼容?但 transformers 为啥会去调用 tensorflow(环境中确实额外装了 tensorflow),不太明白