Open danielmimimi opened 1 year ago
I switched to version 3.1 and got the following error :
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([8, 32, 256, 256], dtype=torch.half, device='cuda', requires_grad=True).to(memory_format=torch.channels_last)
net = torch.nn.Conv2d(32, 12, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half().to(memory_format=torch.channels_last)
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_HALF
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x55636ac53f60
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 8, 32, 256, 256,
strideA = 2097152, 1, 8192, 32,
output: TensorDescriptor 0x55636ac4a9a0
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 8, 12, 256, 256,
strideA = 786432, 1, 3072, 12,
weight: FilterDescriptor 0x55636ac62780
type = CUDNN_DATA_HALF
tensor_format = CUDNN_TENSOR_NHWC
nbDims = 4
dimA = 12, 32, 1, 1,
Pointer addresses:
input: 0xe4e000000
output: 0xe49000000
weight: 0xe055ffa00
I tried your code snipped to reproduce the error but it was working. I will attach my Dockerfile and Code. Downloads.zip
Same problem. Did you fix it?
Hi, it should work now with the newest version from GitHub. This Notebook was fixed before, the others should now also be up to date.
For context: A while ago we included the data normalization in the networks. Since then all models expect input data to be in the range [0, 1]. To modify the normalization, you can adapt inputs_mean
and inputs_std
.
Hello,
I just downloaded the code and let the jupyter notebook run inside the docker. Unfortunately during the first training step an error occured - see following description
I used version 0.4
I thought I let you know, thanks
Daniel