Pose-Group / DCPose

This is an official implementation of our CVPR 2021 paper "Deep Dual Consecutive Network for Human Pose Estimation" (https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Deep_Dual_Consecutive_Network_for_Human_Pose_Estimation_CVPR_2021_paper.pdf)
371 stars 61 forks source link

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device #29

Open desires19 opened 2 years ago

desires19 commented 2 years ago

Hello, my bro I have met a problem,,when I reproduced the excellent research, DCPose python:3.6.12 cuda:11.0 GTX3080

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device error in modulated_deformable_col2im_coord_cuda: no kernel image is available for execution on the device

Whj-cv commented 2 years ago

I have the same problem. Have you solved it?

chenhaomingbob commented 2 years ago

Hi, @desires19 @Whj-cv Thank you for your interest in our work. We develop DCPose using CUDA 10.0 and do not test in the environment of CUDA 11.0. Maybe you can try to run DCPose in the environment of CUDA 10.0. Hope this will solve this problem.

desires19 commented 2 years ago

I have sloved it by changing GTX 1080Ti, cuda 10.1, pytorch 1.6.0

peteruhrig commented 2 years ago

We develop DCPose using CUDA 10.0 and do not test in the environment of CUDA 11.0. Maybe you can try to run DCPose in the environment of CUDA 10.0. Hope this will solve this problem.

@chenhaomingbob All recent Nvidia cards using the Ampere Architecture (RTX 3060/3070/3080, A40, A100, and many more; full list here) require CUDA 11.1. I fear that staying with CUDA 10.0 will seriously reduce the adoption of DCPose by researchers.

peteruhrig commented 2 years ago

@chenhaomingbob Compatibility with CUDA 11 and the corresponding Pytorch 1.11 (with CUDA 11 support) only requires minimal changes (use torch_check instead of at_check). See here: https://github.com/Pose-Group/DCPose/issues/34#issuecomment-1133448799

peteruhrig commented 2 years ago

I have to add here that we SOMETIMES run into this problem: When processing a video with roughly 36,000 frames, only around 33,000 are actually processed. For the others, we get the error message:

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device

This is rather surprising, because there is definitely a kernel image available for most of the frames, so at the moment we think that this error message is misleading. I appreciate any help with this!