torch and cudnn problems

Cookie4Free commented 1 month ago

I'm sorry if this is a dumb question. I'm new to this, and I'm trying to learn how to do LoRAs. Most of the settings I imported are from this guide: https://civitai.com/models/22530/guide-make-your-own-loras-easy-and-free

To be honest, I couldn't find any guide on how to use this training script correctly. I'm pretty sure I'm doing something wrong here.

This is the error I get. Triton is only supported for Linux, so I guess I can ignore that one. As for the Torch and cuDNN error, I have no idea what's causing it. Also, 200 hours seems way too long to me—is that normal?

OS Windows 11, rtx4090, amd ryzen 7 5800x3d

A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\diffusers\models\attention_processor.py:1039: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
NaN found in latents, replacing with zeros
B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\autograd\graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ..\aten\src\ATen\native\cudnn\Conv_v8.cpp:919.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
steps:   0%|                                                    | 1/2460 [03:45<154:14:03, 225.80s/it, avr_loss=0.0121]NaN found in latents, replacing with zeros
steps:   0%|                                                       | 2/2460 [09:49<201:06:49, 294.55s/it, avr_loss=nan]

Jelosus2 commented 1 month ago

Hello, seems your using the version of the main branch. Am I right?

Cookie4Free commented 1 month ago

To be honest, I don't really know—I think so. I followed the instructions in readme on how to install it.

Jelosus2 commented 1 month ago

Ok. Try to install the dev branch. git clone -b dev https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

Cookie4Free commented 1 month ago

The time has significantly dropped, thanks. But about the other errors, do I have to do anything about them? And in general, are there any good guides for how to do the settings the best way or presets that I can download?

steps:   0%|                                                                                  | 0/2460 [00:00<?, ?it/s]
epoch 1/10
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\diffusers\models\attention_processor.py:1039: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
NaN found in latents, replacing with zeros
B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\autograd\graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ..\aten\src\ATen\native\cudnn\Conv_v8.cpp:919.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
steps:   0%|                                                      | 1/2460 [00:33<23:10:26, 33.93s/it, avr_loss=0.0107]NaN found in latents, replacing with zeros
steps:   0%|                                                         | 2/2460 [01:06<22:41:44, 33.24s/it, avr_loss=nan]

After some time I get this errors aswell. Can this be ignored?

steps:   1%|▍                                                       | 19/2460 [10:09<21:44:37, 32.07s/it, avr_loss=nan]NaN found in latents, replacing with zeros
Traceback (most recent call last):
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\sdxl_train_network.py", line 189, in <module>
    trainer.train(args)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\train_network.py", line 781, in train
    noise_pred = self.call_unet(
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\sdxl_train_network.py", line 169, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 636, in forward
    return model_forward(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\accelerate\utils\operations.py", line 624, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 1106, in forward
    h = call_module(module, h, emb, context)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 1090, in call_module
    x = layer(x, context)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 745, in forward
    hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 668, in forward
    output = self.forward_body(hidden_states, context, timestep)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 643, in forward_body
    hidden_states = self.attn1(norm_hidden_states) + hidden_states
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 453, in forward
    hidden_states = self._attention(query, key, value)
  File "B:\Stable_Diffusion\LoRA_Easy_Training_Scripts\sd_scripts\library\sdxl_original_unet.py", line 475, in _attention
    attention_probs = attention_probs.to(value.dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 640.00 MiB. GPU
steps:   1%|▍                                                       | 19/2460 [10:31<22:32:09, 33.24s/it, avr_loss=nan]
Failed to train because of error:
Command '['B:\\Stable_Diffusion\\LoRA_Easy_Training_Scripts\\sd_scripts\\venv\\Scripts\\python.exe', 'sd_scripts\\sdxl_train_network.py', '--config_file=runtime_store\\config.toml', '--dataset_config=runtime_store\\dataset.toml']' returned non-zero exit status 1.

Jelosus2 commented 1 month ago

Ok, you said your GPU is a RTX 4090 right?

In what version of SD you want to train?
About the NaN errors. If you see avg_loss=nan while training it means the LoRA is ruined.
While training does the shared memory of your GPU raise?
Please show your toml config file.

Cookie4Free commented 1 month ago

In SDXL with Pony Diffusion
I see, it also dropped the GPU usage after that.
GPU memory is at 48,4/56,0 GB with shared memory at 24,8/32,0 GB


[[subsets]]
num_repeats = 2
caption_extension = ".txt"
shuffle_caption = false
flip_aug = false
color_aug = false
random_crop = false
is_reg = false
image_dir = "B:/Stable_Diffusion/Lora training/nanoless/dataset"
keep_tokens = 0

[noise_args]

[sample_args]

[logging_args]

[general_args.args] pretrained_model_name_or_path = "B:/Stable_Diffusion/Data/checkpoints/PonyDiffusionV6XL_SDXL.safetensors" mixed_precision = "fp16" seed = 23 max_data_loader_n_workers = 1 persistent_data_loader_workers = true max_token_length = 225 prior_loss_weight = 1.0 sdxl = true max_train_epochs = 10 full_bf16 = false full_fp16 = true vae = "B:/Stable_Diffusion/Data/VAE/sdxl_vae.safetensors"

[general_args.dataset_args] resolution = [ 1024, 1024,] batch_size = 2

[network_args.args] network_dim = 16 network_alpha = 8.0 min_timestep = 0 max_timestep = 1000

[optimizer_args.args] optimizer_type = "AdamW" lr_scheduler = "cosine" learning_rate = 0.0001 max_grad_norm = 1.0 warmup_ratio = 0.05 min_snr_gamma = 5

[saving_args.args] output_dir = "B:/Stable_Diffusion/Lora training/nanoless/output" save_precision = "fp16" save_model_as = "safetensors"

[bucket_args.dataset_args] enable_bucket = true min_bucket_reso = 256 max_bucket_reso = 1024 bucket_reso_steps = 64

[network_args.args.network_args] conv_dim = 4 conv_alpha = 4.0 algo = "locon"

[optimizer_args.args.optimizer_args] weight_decay = "0.1" betas = "0.9,0.99"

Jelosus2 commented 1 month ago

What are you targeting to train? A character, style, concept?

Cookie4Free commented 1 month ago

I wanted to train a style. I also wanted to create a character later on after this project.

Jelosus2 commented 1 month ago

Do you have Discord?

Cookie4Free commented 1 month ago

Yes, I do. Same name as here: Cookie4Free

Jelosus2 commented 1 month ago

Did it get solved?

Cookie4Free commented 1 month ago

Yes, so installing the dev branch git clone https://github.com/derrian-distro/LoRA_Easy_Training_Scripts -b dev and doing the right settings to prevent the memory overflow fixed it.

derrian-distro / LoRA_Easy_Training_Scripts

torch and cudnn problems #212