Trouble getting the trainer to run on fresh install

Cruxial0 commented 2 weeks ago

I recently got a new GPU and decided to get into LoRA training. However, I keep running into issues no matter what I try. The issue that seems to persist the most is ModuleNotFoundError: No module named 'triton'. #160 mentioned this was an issue with the toml loader, and that an escape character is being used somewhere, which is not the case for me. I have tried getting it to work both on the main branch and the dev branch.

Here's the steps I took to get my error:

Clone github repository
Run install.bat
Answer the relevant questions (run locally: yes, high end 30X0/40X0 card: yes, + more)
Run the trainer using the toml config below

My setup consists of: CPU: AMD Ryzen 7 7800X3D GPU: RTX 4080 SUPER RAM: 64GB DDR5 6400MHz

Log output:

sdxl_train_network.py
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
2024-07-03 13:11:07 INFO     Loading settings from                                                    train_util.py:3744
                             E:\LoRA_Easy_Training_Scripts\backend\runtime_store\config.toml...
                    INFO     E:\LoRA_Easy_Training_Scripts\backend\runtime_store\config               train_util.py:3763
2024-07-03 13:11:07 INFO     prepare tokenizers                                                   sdxl_train_util.py:134
                    INFO     Loading dataset config from                                            train_network.py:161
                             E:\LoRA_Easy_Training_Scripts\backend\runtime_store\dataset.toml
                    INFO     prepare images.                                                          train_util.py:1572
                    INFO     found directory F:/AI/Training/INKxXmo Dark/img/20_truepov sharp         train_util.py:1519
                             contains 21 image files
                    INFO     found directory F:/AI/Training/INKxXmo Dark/img/20_truepov smooth        train_util.py:1519
                             contains 69 image files
                    INFO     1800 train images with repeating.                                        train_util.py:1613
                    INFO     0 reg images.                                                            train_util.py:1616
                    WARNING  no regularization images / 正則化画像が見つかりませんでした              train_util.py:1621
                    INFO     [Dataset 0]                                                              config_util.py:565
                               batch_size: 2
                               resolution: (1024, 1024)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 512
                               max_bucket_reso: 2048
                               bucket_reso_steps: 128
                               bucket_no_upscale: False

                               [Subset 0 of Dataset 0]
                                 image_dir: "F:/AI/Training/INKxXmo Dark/img/20_truepov sharp"
                                 image_count: 21
                                 num_repeats: 20
                                 shuffle_caption: True
                                 keep_tokens: 2
                                 keep_tokens_separator:
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: True
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 is_reg: False
                                 class_tokens: None
                                 caption_extension: .txt

                               [Subset 1 of Dataset 0]
                                 image_dir: "F:/AI/Training/INKxXmo Dark/img/20_truepov smooth"
                                 image_count: 69
                                 num_repeats: 20
                                 shuffle_caption: True
                                 keep_tokens: 2
                                 keep_tokens_separator:
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: True
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 is_reg: False
                                 class_tokens: None
                                 caption_extension: .txt

                    INFO     [Dataset 0]                                                              config_util.py:571
                    INFO     loading image sizes.                                                      train_util.py:853
100%|███████████████████████████████████████████████████████████████████████████████| 90/90 [00:00<00:00, 35927.23it/s]
                    INFO     make buckets                                                              train_util.py:859
                    INFO     number of images (including repeats) /                                    train_util.py:905
                             各bucketの画像枚数（繰り返し回数を含む）
                    INFO     bucket 0: resolution (768, 1280), count: 1800                             train_util.py:910
                    INFO     mean ar error (without repeats): 0.007871383101851854                     train_util.py:915
                    INFO     preparing accelerator                                                  train_network.py:225
accelerator device: cuda
                    INFO     loading model for process 0/1                                         sdxl_train_util.py:30
                    INFO     load StableDiffusion checkpoint:                                      sdxl_train_util.py:70
                             E:/Fooocus_win64_2-1-791/Fooocus/models/checkpoints/C_XL_P_ponyDiffus
                             ionV6.safetensors
                    INFO     building U-Net                                                       sdxl_model_util.py:192
                    INFO     loading U-Net from checkpoint                                        sdxl_model_util.py:196
2024-07-03 13:11:10 INFO     U-Net: <All keys matched successfully>                               sdxl_model_util.py:202
                    INFO     building text encoders                                               sdxl_model_util.py:205
                    INFO     loading text encoders from checkpoint                                sdxl_model_util.py:258
                    INFO     text encoder 1: <All keys matched successfully>                      sdxl_model_util.py:272
2024-07-03 13:11:12 INFO     text encoder 2: <All keys matched successfully>                      sdxl_model_util.py:276
                    INFO     building VAE                                                         sdxl_model_util.py:279
                    INFO     loading VAE from checkpoint                                          sdxl_model_util.py:284
                    INFO     VAE: <All keys matched successfully>                                 sdxl_model_util.py:287
                    INFO     load VAE:                                                                model_util.py:1268
                             E:/Fooocus_win64_2-1-791/Fooocus/models/vae/V_XL_P_sdxl_vae.safetensors
2024-07-03 13:11:13 INFO     additional VAE loaded                                                sdxl_train_util.py:128
                    INFO     Enable xformers for U-Net                                                train_util.py:2660
import network module: networks.lora
                    INFO     create LoRA network. base dim (rank): 64, alpha: 32.0                           lora.py:810
                    INFO     neuron dropout: p=None, rank dropout: p=None, module dropout: p=None            lora.py:811
                    INFO     create LoRA for Text Encoder 1:                                                 lora.py:902
                    INFO     create LoRA for Text Encoder 2:                                                 lora.py:902
                    INFO     create LoRA for Text Encoder: 264 modules.                                      lora.py:910
2024-07-03 13:11:14 INFO     create LoRA for U-Net: 722 modules.                                             lora.py:918
                    INFO     enable LoRA for text encoder                                                    lora.py:961
                    INFO     enable LoRA for U-Net                                                           lora.py:966
prepare optimizer, data loader etc.
                    INFO     use AdamW optimizer | {'weight_decay': 0.1, 'betas': (0.9, 0.99)}        train_util.py:4087
override steps. steps for 5 epochs is / 指定エポックまでのステップ数: 4500
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 1800
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 900
  num epochs / epoch数: 5
  batch size per device / バッチサイズ: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 4500
steps:   0%|                                                                                  | 0/4500 [00:00<?, ?it/s]
epoch 1/5
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
    from xformers.triton.softmax import softmax as triton_softmax  # noqa
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'
E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
NaN found in latents, replacing with zeros
steps:   0%|                                                                       | 1/4500 [00:37<46:43:34, 37.39s/it]2024-07-03 13:11:58 INFO                                                                              train_util.py:5130
                    INFO     generating sample images at step / サンプル画像生成 ステップ: 1          train_util.py:5131
Traceback (most recent call last):
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\sdxl_train_network.py", line 185, in <module>
    trainer.train(args)
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\train_network.py", line 926, in train
    self.sample_images(accelerator, args, None, global_step, accelerator.device, vae, tokenizer, text_encoder, unet)
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\sdxl_train_network.py", line 168, in sample_images
    sdxl_train_util.sample_images(accelerator, args, epoch, global_step, device, vae, tokenizer, text_encoder, unet)
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\library\sdxl_train_util.py", line 372, in sample_images
    return train_util.sample_images_common(SdxlStableDiffusionLongPromptWeightingPipeline, *args, **kwargs)
  File "E:\LoRA_Easy_Training_Scripts\backend\sd_scripts\library\train_util.py", line 5132, in sample_images_common
    if not os.path.isfile(args.sample_prompts):
  File "C:\Users\benja\AppData\Local\Programs\Python\Python310\lib\genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
steps:   0%|                                                                       | 1/4500 [00:37<46:52:34, 37.51s/it]

TOML Config:

[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/20_inkxxmo sharp"
keep_tokens = 2
name = "20"
num_repeats = 20
shuffle_caption = true

[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/20_inkxxmo smooth"
keep_tokens = 2
name = "20_1720005058026776800"
num_repeats = 20
shuffle_caption = true

[train_mode]
train_mode = "lora"

[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "E:/Fooocus_win64_2-1-791/Fooocus/models/checkpoints/C_XL_P_ponyDiffusionV6.safetensors"
vae = "E:/Fooocus_win64_2-1-791/Fooocus/models/vae/V_XL_P_sdxl_vae.safetensors"
sdxl = true
mixed_precision = "fp16"
seed = 23
prior_loss_weight = 1.0
xformers = true
max_train_epochs = 5

[general_args.dataset_args]
resolution = [ 1024, 1024,]
batch_size = 2

[network_args.args]
network_dim = 64
network_alpha = 32.0
min_timestep = 0
max_timestep = 1000

[optimizer_args.args]
optimizer_type = "AdamW"
lr_scheduler = "cosine"
loss_type = "l2"
learning_rate = 0.0001
warmup_ratio = 0.05
max_grad_norm = 1.0
min_snr_gamma = 5

[saving_args.args]
output_dir = "F:/AI/Training/INKxXmo Dark/lora"
output_name = "INKxXmo Dark"
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_toml = true
save_toml_location = "F:/AI/Training/_Runtimes"

[sample_args.args]
sample_sampler = "ddim"
sample_every_n_steps = 1

[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 128

[network_args.args.network_args]

[optimizer_args.args.optimizer_args]
weight_decay = "0.1"
betas = "0.9,0.99"

It's worth mentioning that I managed to run it on the main branch (log outputs above are from dev branch), but it kept giving me NaN found in latents, replacing with zeros after every training step, resulting in the LoRA being completely gray. EDIT: I also don't get any of these errors when I use standalone kohya_ss (i am just too overwhelmed to use that software yet haha)

Jelosus2 commented 2 weeks ago

Hello there. First, you shouldn't worry about the triton error because triton is a library that its only available for Linux, so Windows user will always get the error. Second, I see many configs that you are using that are overkill and that's probably why your LoRA outputs are grey. What are you trying to train? And third, don't generate samples of every epoch with the trainer, they are so basic that it doesn't reflect the real output of the LoRA, for that you should test in your SD webui

Cruxial0 commented 2 weeks ago

Hi, thanks for clearing up the triton issue.

I'm trying to create a style. I know my settings probably aren't very realistic, however most of them were copied directly from Civitai's onsite generator. Could you provide more details about which paramaters are overkill? Is the NaN found in latents, replacing with zeros message something I should worry about? When I look at logs from other issues in this repo, I can't really find that message anywhere.

I'm not - I think those are default values or something. I did not touch the sampling tab.

Jelosus2 commented 2 weeks ago

NaN found in latents, replacing with zeros means that the LoRA was killed and it replaced the weights with 0, so it's broken. Also following civitai on-site trainer parameters is something you must never do, their trainer is so poor. For example, 20 repeats is overkill, even 6 is.

Cruxial0 commented 2 weeks ago

After playing around with pretty much all settings, I am still getting the NaN found in latents, replacing with zeros error with every generation I try. Any ideas?

Jelosus2 commented 2 weeks ago

can you send a toml with the last settings you tried?

Cruxial0 commented 2 weeks ago

[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/10_inkxxmo smooth"
keep_tokens = 2
name = "10"
num_repeats = 2
shuffle_caption = true

[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/30_inkxxmo sharp"
keep_tokens = 2
name = "30"
num_repeats = 6
shuffle_caption = true

[train_mode]
train_mode = "lora"

[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "E:/Fooocus_win64_2-1-791/Fooocus/models/checkpoints/C_XL_P_ponyDiffusionV6.safetensors"
vae = "E:/Fooocus_win64_2-1-791/Fooocus/models/vae/V_XL_P_sdxl_vae.safetensors"
sdxl = true
seed = 23
prior_loss_weight = 1.0
xformers = true
max_train_epochs = 5
mixed_precision = "fp16"

[general_args.dataset_args]
resolution = [ 1024, 1024,]
batch_size = 2

[network_args.args]
network_dim = 32
network_alpha = 8.0
min_timestep = 0
max_timestep = 1000

[optimizer_args.args]
lr_scheduler = "cosine"
loss_type = "l2"
learning_rate = 0.0001
max_grad_norm = 1.0
optimizer_type = "AdamW8bit"

[saving_args.args]
output_dir = "F:/AI/Training/INKxXmo Dark/lora"
output_name = "INKxXmo Dark"
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_toml = true
save_toml_location = "F:/AI/Training/_Runtimes"

[bucket_args.dataset_args]
enable_bucket = true
max_bucket_reso = 1024
min_bucket_reso = 256
bucket_reso_steps = 64

[network_args.args.network_args]

[optimizer_args.args.optimizer_args]
weight_decay = "0.1"
betas = "0.9,0.99"

Jelosus2 commented 1 week ago

Why you use 2 subsets if you are training a style?

Cruxial0 commented 1 week ago

I'm not sure if it's very relevant to my issue, but the artist's style developed over time, so I tried balancing it out using 2 subsets

Jelosus2 commented 1 week ago

Sorry for the delay on the response. If you want to train 2 different styles you'd need to do a training for each one. You can't train 2 styles on the same LoRA. But related to your issue, tell me how many images each style has and I'll provide you a config file for dev branch to see if it solves the things

Cruxial0 commented 1 week ago

No worries, I appreciate the help.

Subset 1 has 21 images, subset 2 has 69

Jelosus2 commented 1 week ago

[[subsets]]
caption_dropout_rate = 0.04
caption_extension = ".txt"
image_dir = "/path/to/dataset"
name = "style"
num_repeats = 4
random_crop = true
shuffle_caption = true

[train_mode]
train_mode = "lora"

[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "/path/to/model"
vae = "/path/to/VAE"
sdxl = true
mixed_precision = "bf16"
gradient_accumulation_steps = 4
seed = 23
max_token_length = 225
prior_loss_weight = 1.0
xformers = true
max_train_epochs = 20

[general_args.dataset_args]
resolution = 1024
batch_size = 2

[network_args.args]
network_dim = 24
network_alpha = 12.0
min_timestep = 0
max_timestep = 1000
network_train_unet_only = true
ip_noise_gamma = 0.05

[optimizer_args.args]
lr_scheduler = "cosine"
optimizer_type = "Came"
lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.Rex"
loss_type = "huber"
huber_schedule = "snr"
huber_c = 0.1
learning_rate = 7e-5
warmup_ratio = 0.15
scale_weight_norms = 9.0
max_grad_norm = 1.0

[saving_args.args]
save_precision = "fp16"
save_model_as = "safetensors"
tag_occurrence = true
save_every_n_epochs = 1
save_toml = true
output_dir = "/path/to/output_dir"
output_name = "style"
tag_file_location = "/path/to/tag_file_location"
save_toml_location = "/path/to/save_toml_location"

[noise_args.args]
noise_offset = 0.0357
multires_noise_iterations = 6
multires_noise_discount = 0.3

[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 64

[network_args.args.network_args]
conv_dim = 24
conv_alpha = 12.0
algo = "locon"
dora_wd = true

[optimizer_args.args.lr_scheduler_args]
min_lr = 1e-6

[optimizer_args.args.optimizer_args]
weight_decay = "0.1"

Try something like this, adjust the parameters to fit the needs of your GPU and dataset and train a LoRA per each style, not both at the same time.

Cruxial0 commented 1 week ago

This does actually work, however it's taking 4-5 minutes to complete a single step. Is this the performance I should be expecting using an RTX 4080 SUPER?

derrian-distro commented 1 week ago

That is very likely a result of going beyond your vram, make sure you have gradient checkpointing enabled

Cruxial0 commented 1 week ago

That seems to have solved all my issues. Thanks for the help!

derrian-distro / LoRA_Easy_Training_Scripts

Trouble getting the trainer to run on fresh install #222