ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.88k stars 506 forks source link

Recent change to train_dreambooth causing CUDNN_STATUS_INTERNAL_ERROR (WSL install) #121

Open astrobread opened 1 year ago

astrobread commented 1 year ago

Describe the bug

When using dreambooth in WSL Ubuntu 20.04, a recent change is causing training to fail. Undoing this single change fixes the issue on the most recent checkin. (f94be89)

input_ids = tokenizer.pad(
            {"input_ids": input_ids},
-            padding="max_length",
-            max_length=tokenizer.model_max_length,
+            padding=True,
            return_tensors="pt",
        ).input_ids

I can use this workaround and understand my particular setup may not be supported, but wanted to share in case it was impacting others or if there was something I can do to fix it.

Reproduction

Setup this repo in WSL using instructions here: https://pastebin.com/uE1WcSxD Tweak one step: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 Execute training using script: mytraining.txt

Logs

Traceback (most recent call last):
  File "/home/username/github/diffusers/examples/dreambooth/train_dreambooth.py", line 824, in <module>
    main(args)
  File "/home/username/github/diffusers/examples/dreambooth/train_dreambooth.py", line 788, in main
    accelerator.backward(loss)
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py", line 882, in backward
    self.scaler.scale(loss).backward(**kwargs)
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/home/username/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

ConvolutionParams
    memory_format = ChannelsLast
    data_type = CUDNN_DATA_HALF
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x7fce222a4920
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 2, 640, 16, 16,
    strideA = 163840, 1, 10240, 640,
output: TensorDescriptor 0x7fce300c0c30
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 2, 1280, 16, 16,
    strideA = 327680, 1, 20480, 1280,
weight: FilterDescriptor 0x7fce222857c0
    type = CUDNN_DATA_HALF
    tensor_format = CUDNN_TENSOR_NHWC
    nbDims = 4
    dimA = 1280, 640, 1, 1,
Pointer addresses:
    input: 0x85a600000
    output: 0x7814c0000
    weight: 0x856c48000

System Info

JantineD commented 1 year ago

I'm having this exact same problem:

import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([2, 320, 64, 64], dtype=torch.half, device='cuda', requires_grad=True) net = torch.nn.Conv2d(320, 4, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().half() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()

ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_HALF padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x5dce960 type = CUDNN_DATA_HALF nbDims = 4 dimA = 2, 320, 64, 64, strideA = 1310720, 4096, 64, 1, output: TensorDescriptor 0xa7edd830 type = CUDNN_DATA_HALF nbDims = 4 dimA = 2, 4, 64, 64, strideA = 16384, 4096, 64, 1, weight: FilterDescriptor 0x7f62bc02e570 type = CUDNN_DATA_HALF tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 4, 320, 3, 3, Pointer addresses: input: 0x7f5b1a000000 output: 0x7f5cbbdea000 weight: 0x7f5e64dfa000

I fixed it by updating input_ids:

` input_ids = tokenizer.pad( {"input_ids": input_ids}, padding="max_length", max_length=tokenizer.model_max_length,

padding=True,

        return_tensors="pt",
    ).input_ids

`