RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

mehranjeelani commented 3 years ago

I get the following error when I use your trained model to test on vid4 dataset. I was able to compile deformable convolution and have torch version = 0.3.1 and python = 3.6 with cuda = 9. Kindly help! Traceback (most recent call last): File "eval.py", line 117, in output, _ = model(lr) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, kwargs) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, *kwargs) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(input, kwargs) File "/data2/superresolution/video_sr/TDAN-VSR/model.py", line 225, in forward out = self.relu(self.conv_first(y)) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward self.padding, self.dilation, self.groups) File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

YapengTian commented 3 years ago

Did you change any code and run the given test examples? It seems that the issue is from gpu device parallel. Sorry for the very late response.

mehranjeelani commented 3 years ago

Yes, I am changing the code a bit. I am actually testing on my custom dataset. My test directory is just the path to the folder containing all the frames, and I am accordingly changing the code. Here is my python code for eval.py:

import argparse
import sys
import scipy
import os
from PIL import Image
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import numpy as np
from skimage import io, transform
from model import ModelFactory
from torch.autograd import Variable
import time
description='Video Super Resolution pytorch implementation'

def forward_x8(lr, forward_function=None):
        def _transform(v, op):
            v = v.float()

            v2np = v.data.cpu().numpy()
            #print(v2np.shape)
            if op == 'v':
                tfnp = v2np[:, :, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v2np[:, :, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v2np.transpose((0, 1, 2, 4, 3)).copy()

            ret = Variable(torch.Tensor(tfnp).cuda())
            #ret = ret.half()

            return ret

        def _transform_back(v, op):

            if op == 'v':
                tfnp = v[:, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v[:, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v.transpose((0, 1, 3, 2)).copy()

            return tfnp

        x = [lr]
        for tf in 'v', 'h': x.extend([_transform(_x, tf) for _x in x])

        list_r = []
        for k in range(len(x)):
            z = x[k]
            r, _ = forward_function(z)
            r = r.data.cpu().numpy()
            if k % 4 > 1:
                    r =  _transform_back(r, 'h')
            if (k % 4) % 2 == 1:
                    r =  _transform_back(r, 'v')
            list_r.append(r)
        y = np.sum(list_r,  axis=0)/4.0

        y = Variable(torch.Tensor(y).cuda())
        if len(y) == 1: y = y[0]
        return y
def quantize(img, rgb_range):
    return img.mul(255 / rgb_range).clamp(0, 255).round()

parser = argparse.ArgumentParser(description=description)

parser.add_argument('-m', '--model', metavar='M', type=str, default='TDAN',
                    help='network architecture.')
parser.add_argument('-s', '--scale', metavar='S', type=int, default=4, 
                    help='interpolation scale. Default 4')
parser.add_argument('-t', '--test-set', metavar='NAME', type=str, default='../datasets/KLE_1519',
                    help='dataset for testing.')
parser.add_argument('-mp', '--model-path', metavar='MP', type=str, default='model',
                    help='model path.')
parser.add_argument('-sp', '--save-path', metavar='SP', type=str, default='res/KLE_1519_sr',
                    help='saving directory path.')
args = parser.parse_args()

model_factory = ModelFactory()
model = model_factory.create_model(args.model)
dir_LR = args.test_set
#lis = sorted(os.listdir(dir_LR))
model_path = os.path.join(args.model_path, 'model.pt')
if not os.path.exists(model_path):
    raise Exception('Cannot find %s.' %model_path)
model = torch.load(model_path)
model.eval()
path = args.save_path
if not os.path.exists(path):
    os.makedirs(path)

#for i in range(len(lis)):
for i in range(1):
    #print(lis[i])
    LR = dir_LR
    ims = sorted(os.listdir(LR))
    num = len(ims)
    # number of the seq
    num = len(ims)
    image = io.imread(os.path.join(LR, ims[0]))
    row, col, ch = image.shape
    frames_lr = np.zeros((5, int(row), int(col), ch))
    for j in range(num):
        for k in range(j-2, j + 3):
            idx = k-j+2
            if k < 0:
                k = -k
            if k >= num:
                k = num - 3
            frames_lr[idx, :, :, :] = io.imread(os.path.join(LR, ims[k]))
        start = time.time()
        frames_lr = frames_lr/255.0 - 0.5
        lr = torch.from_numpy(frames_lr).float().permute(0, 3, 1, 2)
        lr = Variable(lr.cuda()).unsqueeze(0).contiguous()
        output, _ = model(lr)
        #output = forward_x8(lr, model)
        output = (output.data + 0.5)*255
        output = quantize(output, 255)
        output = output.squeeze(dim=0)
        elapsed_time = time.time() - start
        print(elapsed_time)
        img_name = os.path.join(path,ims[j])
        Image.fromarray(np.around(output.cpu().numpy().transpose(1, 2, 0)).astype(np.uint8)).save(img_name)

Jin-97 commented 3 years ago

I have the same problem.Did you solve it?

mehranjeelani commented 3 years ago

Hi. No, I actually used another model which gave better results. I didn't bother to fix this

YapengTian commented 3 years ago

Sorry, I missed it. @Jin-97 Do you still have the problem. It is pretty weird to see a parallel issue since only GPU is used.

Jin-97 commented 3 years ago

I reconfigure the dependencies：python=3.6.6，torch=0.3.1, cuda=9.1,and seem to solve the problem.Because a new problem has emerged:RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58. I just run the test code, my GPU is GTX1080.Where can I change the batchsize？Or do you have any suggestions？

YapengTian commented 3 years ago

If you are training the model, using a smaller batchsize is a good choice. If you are running testing, I would like to suggest you use the chop_forward function in the solver https://github.com/YapengTian/TDAN-VSR-CVPR-2020/blob/master/solver.py , which split the whole video frames into smaller patches.

Jin-97 commented 3 years ago

Thanks~~~

YapengTian / TDAN-VSR-CVPR-2020

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED #51