Issue on Dev branch v0.0.8 RuntimeError: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

cian0 commented 1 year ago

My env:

Ubuntu 22 xformers 0.0.14.dev0 torch 1.12.1 diffusers 0.9.0 Python 3.9.12 Cuda 11.7 RTX 3090

Note that this doesn't occur when I uninstall/reinstall 0.0.7 lora_diffusion dev branch

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Traceback (most recent call last):
  File "/home/ian/miniconda3/bin/lora_pti", line 33, in <module>
    sys.exit(load_entry_point('lora-diffusion', 'console_scripts', 'lora_pti')())
  File "/home/ian/projs/lora/lora_diffusion/cli_lora_pti.py", line 738, in main
    fire.Fire(train)
  File "/home/ian/miniconda3/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ian/miniconda3/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ian/miniconda3/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ian/projs/lora/lora_diffusion/cli_lora_pti.py", line 642, in train
    train_inversion(
  File "/home/ian/projs/lora/lora_diffusion/cli_lora_pti.py", line 307, in train_inversion
    loss.backward()
  File "/home/ian/miniconda3/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/ian/miniconda3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/ian/miniconda3/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/ian/miniconda3/lib/python3.9/site-packages/xformers/ops.py", line 369, in backward
    ) = torch.ops.xformers.efficient_attention_backward_cutlass(
  File "/home/ian/miniconda3/lib/python3.9/site-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

cloneofsimo commented 1 year ago

This is xformer error. I'm not really familiar with xformer here, maybe @hafriedlander might be able to help? BTW lora_pti is not tested with xformer right now.

hafriedlander commented 1 year ago

Hmm. So: 1) Xformers doesn't work backwards on 3090 with certain unet sizes. This is a known issue, and xformers team aren't likely to fix. 2) When you enable xformers in https://github.com/cloneofsimo/lora/blob/master/training_scripts/train_lora_dreambooth.py I fix this by only enabling it on unets with sizes that do work. The specific code is at https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/xformers_utils.py#L42 3) But that code isn't ever called with cli_lora_pti - so how is xformers getting enabled?

@cian0 What is the exact version of the Diffusers library you are using?

cloneofsimo commented 1 year ago

Hmm. So:

Xformers doesn't work backwards on 3090 with certain unet sizes. This is a known issue, and xformers team aren't likely to fix.

When you enable xformers in https://github.com/cloneofsimo/lora/blob/master/training_scripts/train_lora_dreambooth.py I fix this by only enabling it on unets with sizes that do work. The specific code is at https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/xformers_utils.py#L42

But that code isn't ever called with cli_lora_pti - so how is xformers getting enabled?

@cian0 What is the exact version of the Diffusers library you are using?

Indeed, lora_pti doesn't support xformers yet...

cian0 commented 1 year ago

Hmm. So:

Xformers doesn't work backwards on 3090 with certain unet sizes. This is a known issue, and xformers team aren't likely to fix.

When you enable xformers in https://github.com/cloneofsimo/lora/blob/master/training_scripts/train_lora_dreambooth.py I fix this by only enabling it on unets with sizes that do work. The specific code is at https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/xformers_utils.py#L42

But that code isn't ever called with cli_lora_pti - so how is xformers getting enabled?

@cian0 What is the exact version of the Diffusers library you are using?

weird, when I run pip list it is 0.9.0 but when I run conda list it is 0.7.0.dev0 for my diffusers lib, I'll try to update both and see if it gets resolved as well

hafriedlander commented 1 year ago

Cool, that's what I wondered - some versions of Diffusers (0.9.0 and 0.10.0) tried to automatically enable xformers if it was installed,. 0.11 doesn't for sure, so try updating to latest & see how it goes.

cian0 commented 1 year ago

Fixed now with updating diffusers thanks!

cloneofsimo / lora

Issue on Dev branch v0.0.8 RuntimeError: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. #116