BR-IDL / PaddleViT

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
https://github.com/BR-IDL/PaddleViT
Apache License 2.0
1.22k stars 318 forks source link

I got some Warning and Error when runing to evaluate DETR model performance on COCO2017 with a single GPU #126

Closed Atlantisming closed 2 years ago

Atlantisming commented 2 years ago

I used the AIstudio GPU version and tried the DETR project.But got some CUDA mistake. I set the config. !pip install yacs import paddle from config import get_config from detr import build_detr config = get_config('./configs/detr_resnet50.yaml') model, critertion, postprocessors = build_detr(config) model_state_dict = paddle.load('detr_resnet50.pdparams') model.set_dict(model_state_dict) and this is the return W1216 20:21:43.708824 167 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W1216 20:21:43.712436 167 device_context.cc:422] device: 0, cuDNN Version: 7.6. 100%|██████████| 151272/151272 [00:02<00:00, 69020.62it/s] And when I command sh run_eval.sh I got some Warning and Error. W1216 20:22:31.344815 398 init.cc:141] Compiled with WITH_GPU, but no GPU found in runtime. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py:301: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. "You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default."

Traceback (most recent call last): File "main_single_gpu.py", line 321, in <module> main() File "main_single_gpu.py", line 174, in main paddle.seed(seed) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/random.py", line 46, in seed for i in range(core.get_cuda_device_count()): OSError: (External) Cuda error(100), no CUDA-capable device is detected. [Advise: Please search for the error code(100) on website( https://docs.nvidia.com/cuda/archive/10.0/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038 ) to get Nvidia's official solution about CUDA Error.] (at /paddle/paddle/fluid/platform/gpu_info.cc:99)

DDXDaniel commented 2 years ago

hello @Atlantisming , you may check by the following steps:

  1. make sure you have installed cuda and cudnn successfully.
  2. make sure cuda is compatible with the version of the NVIDIA driver.
  3. install the gpu version paddlepaddle.
Atlantisming commented 2 years ago

hello @Atlantisming , you may check by the following steps:

  1. make sure you have installed cuda and cudnn successfully.
  2. make sure cuda is compatible with the version of the NVIDIA driver.
  3. install the gpu version paddlepaddle.

@DDXDaniel Thanks for your reply. But as you can see on the top,I am sure that I have installed cuda and cudnn with this return. W1216 20:21:43.708824 167 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W1216 20:21:43.712436 167 device_context.cc:422] device: 0, cuDNN Version: 7.6. 100%|██████████| 151272/151272 [00:02<00:00, 69020.62it/s] And this is the nvidia-smi of my environment.
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:05:00.0 Off | 0 | | N/A 36C P0 52W / 300W | 1423MiB / 32480MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| The minimum compatible Driver Version about CUDA 10.1 need to be >=418.39.So I think that the step2 has been solved. And I was using the AIstudio GPU version,which is installed paddle GPU version default. The Error is still exist.Do you have any proposal I can try?

DDXDaniel commented 2 years ago

hello @Atlantisming , you may check by the following steps:

  1. make sure you have installed cuda and cudnn successfully.
  2. make sure cuda is compatible with the version of the NVIDIA driver.
  3. install the gpu version paddlepaddle.

@DDXDaniel Thanks for your reply. But as you can see on the top,I am sure that I have installed cuda and cudnn with this return. W1216 20:21:43.708824 167 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W1216 20:21:43.712436 167 device_context.cc:422] device: 0, cuDNN Version: 7.6. 100%|██████████| 151272/151272 [00:02<00:00, 69020.62it/s] And this is the nvidia-smi of my environment. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:05:00.0 Off | 0 | | N/A 36C P0 52W / 300W | 1423MiB / 32480MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| The minimum compatible Driver Version about CUDA 10.1 need to be >=418.39.So I think that the step2 has been solved. And I was using the AIstudio GPU version,which is installed paddle GPU version default. The Error is still exist.Do you have any proposal I can try?

@Atlantisming Have you set "CUDA_VISIBLE_DEVICES=0" in run_eval.sh?

Atlantisming commented 2 years ago

hello @Atlantisming , you may check by the following steps:

  1. make sure you have installed cuda and cudnn successfully.
  2. make sure cuda is compatible with the version of the NVIDIA driver.
  3. install the gpu version paddlepaddle.

@DDXDaniel Thanks for your reply. But as you can see on the top,I am sure that I have installed cuda and cudnn with this return. W1216 20:21:43.708824 167 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W1216 20:21:43.712436 167 device_context.cc:422] device: 0, cuDNN Version: 7.6. 100%|██████████| 151272/151272 [00:02<00:00, 69020.62it/s] And this is the nvidia-smi of my environment. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:05:00.0 Off | 0 | | N/A 36C P0 52W / 300W | 1423MiB / 32480MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| The minimum compatible Driver Version about CUDA 10.1 need to be >=418.39.So I think that the step2 has been solved. And I was using the AIstudio GPU version,which is installed paddle GPU version default. The Error is still exist.Do you have any proposal I can try?

@Atlantisming Have you set "CUDA_VISIBLE_DEVICES=0" in run_eval.sh?

Thanks! I set it and fix the bug!