cuDNN Execution Plan Failure Warning in PyTorch 2.3.0 with CUDA 11.8 and cuDNN 8.7

Flllllying commented 4 months ago

Description

When running certain operations in PyTorch, I'm encountering the following warnings:

UserWarning: Plan failed with a cudnnException: 
CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at 
../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)

This occurs during both 2D and 3D convolution operations.

Environment

PyTorch Version: 2.3.0
CUDA Version (PyTorch): 11.8
cuDNN Version (PyTorch): 8.7.0
OS: Ubuntu (please specify version)
GPU: NVIDIA A10G
NVIDIA Driver Version: 550.90.07
System CUDA Version: 12.4

Steps to Reproduce

Just By running python inference.py

Additional Information

The warning appears for both nn.Conv2d and nn.Conv3d operations.
Despite the warning, the model seems to run, but I'm concerned about potential performance impacts or hidden issues.
This occurs even though the GPU (NVIDIA A10G) should be fully capable of handling these operations.

Questions

Is this warning indicative of a serious problem, or can it be safely ignored?
Could this be related to the mismatch between the system CUDA version (12.4) and the PyTorch CUDA version (11.8)?
Are there any workarounds or fixes available for this issue?
Will it impact performance much? I am using ComfyUI custom nodes, and it processed 20s video in 30s, which is awesome.

Logs

python inference.py
[08:08:55] Load appearance_feature_extractor done.                                                                                               live_portrait_wrapper.py:29
           Load motion_extractor done.                                                                                                           live_portrait_wrapper.py:32
[08:08:56] Load warping_module done.                                                                                                             live_portrait_wrapper.py:35
           Load spade_generator done.                                                                                                            live_portrait_wrapper.py:38
           Load stitching_retargeting_module done.                                                                                               live_portrait_wrapper.py:42
[08:08:57] LandmarkRunner warmup time: 0.813s                                                                                                          landmark_runner.py:89
[08:08:58] FaceAnalysisDIY warmup time: 0.797s                                                                                                       face_analysis_diy.py:79
           Load source image from /home/ubuntu/LivePortrait/src/config/../../assets/examples/source/s6.jpg                                      live_portrait_pipeline.py:46
[08:08:59] Load from video file (mp4 mov avi etc...): /home/ubuntu/LivePortrait/src/config/../../assets/examples/driving/d0.mp4                 live_portrait_pipeline.py:72
/home/ubuntu/LivePortrait/LivePortrait/lib/python3.9/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: 
CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at 
../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv3d(
/home/ubuntu/LivePortrait/LivePortrait/lib/python3.9/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: 
CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at 
../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv2d(input, weight, bias, self.stride,
Animating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:05
Concatenating result... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Dump to animations/s6--d0_concat.mp4

writing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--[swscaler @ 0x581ebc0] Warning: data is not aligned! This can lead to a speed loss
Dump to animations/s6--d0.mp4

Rolandjg commented 4 months ago

Having the same problem on Fedora 40 with an RTX 3060. Doesn't any video output though but I assume it must slow it down.

rickt commented 4 months ago

these are just warnings. if you add:

torch.backends.cudnn.benchmark = True

immediately below

import torch

at the beginning of src/live_portrait_wrapper.py, the CUDNN_STATUS_NOT_SUPPORTED warnings go away.

what does this do? https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936

FurkanGozukara commented 4 months ago

it works perfect with

cuda 11.8 - python 3.10.11 and c++ tools and generating venv and installing inside that

here my full tutorials thread : https://github.com/KwaiVGI/LivePortrait/issues/78

you can watch this tutorial to learn how to install : cuda 11.8 - python 3.10.11 and c++ tools

https://youtu.be/-NjNy7afOQ0

Essential AI Tools and Libraries: A Guide to Python, Git, C++ Compile Tools, FFmpeg, CUDA, PyTorch

cleardusk commented 4 months ago

@rickt Thanks! The fix lies in https://github.com/KwaiVGI/LivePortrait/blob/d8036cffdeecd395f8bf02e70e481c4207842cf1/src/live_portrait_pipeline.py#L7-L8

cleardusk commented 4 months ago

@FurkanGozukara You are very warm-hearted!

KwaiVGI / LivePortrait