TsinghuaAI / CPM-2-Finetune

Finetune CPM-2
MIT License
83 stars 21 forks source link

RuntimeError: Unable to proceed, no GPU resources available #33

Open louxingrui opened 2 years ago

louxingrui commented 2 years ago

当我使用bash scripts/full_model/finetune_cpm2_math.sh后,显示RuntimeError: Unable to proceed, no GPU resources available,我的显卡是rtx2080Ti,安装了cuda10.2,在docker环境外跑程序是没有问题的,请问是因为cuda版本和docker环境内的版本不一致的问题吗?这是终端中一些错误的主要信息: [2022-01-31 14:07:27,900] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. /opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "/opt/conda/bin/deepspeed", line 6, in <module> main() File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/runner.py", line 264, in main raise RuntimeError("Unable to proceed, no GPU resources available") RuntimeError: Unable to proceed, no GPU resources available 希望能得到您的答复!

XiaoqingNLP commented 2 years ago

检查一下hostfile 文件,再检查一下GPU是否正确安装

t1101675 commented 2 years ago

Hostfile 文件需要包含主机 ssh 时的名称或者 ip