Open astrobread opened 1 year ago
I'm having this exact same problem:
import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([2, 320, 64, 64], dtype=torch.half, device='cuda', requires_grad=True) net = torch.nn.Conv2d(320, 4, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().half() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()
ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_HALF padding = [1, 1, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x5dce960 type = CUDNN_DATA_HALF nbDims = 4 dimA = 2, 320, 64, 64, strideA = 1310720, 4096, 64, 1, output: TensorDescriptor 0xa7edd830 type = CUDNN_DATA_HALF nbDims = 4 dimA = 2, 4, 64, 64, strideA = 16384, 4096, 64, 1, weight: FilterDescriptor 0x7f62bc02e570 type = CUDNN_DATA_HALF tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 4, 320, 3, 3, Pointer addresses: input: 0x7f5b1a000000 output: 0x7f5cbbdea000 weight: 0x7f5e64dfa000
I fixed it by updating input_ids:
` input_ids = tokenizer.pad( {"input_ids": input_ids}, padding="max_length", max_length=tokenizer.model_max_length,
return_tensors="pt",
).input_ids
`
Describe the bug
When using dreambooth in WSL Ubuntu 20.04, a recent change is causing training to fail. Undoing this single change fixes the issue on the most recent checkin. (f94be89)
I can use this workaround and understand my particular setup may not be supported, but wanted to share in case it was impacting others or if there was something I can do to fix it.
Reproduction
Setup this repo in WSL using instructions here: https://pastebin.com/uE1WcSxD Tweak one step: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 Execute training using script: mytraining.txt
Logs
System Info
diffusers
version: 0.8.0.dev0