Closed Cruxial0 closed 1 week ago
Hello there. First, you shouldn't worry about the triton error because triton is a library that its only available for Linux, so Windows user will always get the error. Second, I see many configs that you are using that are overkill and that's probably why your LoRA outputs are grey. What are you trying to train? And third, don't generate samples of every epoch with the trainer, they are so basic that it doesn't reflect the real output of the LoRA, for that you should test in your SD webui
Hi, thanks for clearing up the triton issue.
I'm trying to create a style. I know my settings probably aren't very realistic, however most of them were copied directly from Civitai's onsite generator. Could you provide more details about which paramaters are overkill?
Is the NaN found in latents, replacing with zeros
message something I should worry about? When I look at logs from other issues in this repo, I can't really find that message anywhere.
I'm not - I think those are default values or something. I did not touch the sampling tab.
NaN found in latents, replacing with zeros
means that the LoRA was killed and it replaced the weights with 0, so it's broken.
Also following civitai on-site trainer parameters is something you must never do, their trainer is so poor.
For example, 20 repeats is overkill, even 6 is.
After playing around with pretty much all settings, I am still getting the NaN found in latents, replacing with zeros
error with every generation I try. Any ideas?
can you send a toml with the last settings you tried?
[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/10_inkxxmo smooth"
keep_tokens = 2
name = "10"
num_repeats = 2
shuffle_caption = true
[[subsets]]
caption_extension = ".txt"
flip_aug = true
image_dir = "F:/AI/Training/INKxXmo Dark/img/30_inkxxmo sharp"
keep_tokens = 2
name = "30"
num_repeats = 6
shuffle_caption = true
[train_mode]
train_mode = "lora"
[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "E:/Fooocus_win64_2-1-791/Fooocus/models/checkpoints/C_XL_P_ponyDiffusionV6.safetensors"
vae = "E:/Fooocus_win64_2-1-791/Fooocus/models/vae/V_XL_P_sdxl_vae.safetensors"
sdxl = true
seed = 23
prior_loss_weight = 1.0
xformers = true
max_train_epochs = 5
mixed_precision = "fp16"
[general_args.dataset_args]
resolution = [ 1024, 1024,]
batch_size = 2
[network_args.args]
network_dim = 32
network_alpha = 8.0
min_timestep = 0
max_timestep = 1000
[optimizer_args.args]
lr_scheduler = "cosine"
loss_type = "l2"
learning_rate = 0.0001
max_grad_norm = 1.0
optimizer_type = "AdamW8bit"
[saving_args.args]
output_dir = "F:/AI/Training/INKxXmo Dark/lora"
output_name = "INKxXmo Dark"
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_toml = true
save_toml_location = "F:/AI/Training/_Runtimes"
[bucket_args.dataset_args]
enable_bucket = true
max_bucket_reso = 1024
min_bucket_reso = 256
bucket_reso_steps = 64
[network_args.args.network_args]
[optimizer_args.args.optimizer_args]
weight_decay = "0.1"
betas = "0.9,0.99"
Why you use 2 subsets if you are training a style?
I'm not sure if it's very relevant to my issue, but the artist's style developed over time, so I tried balancing it out using 2 subsets
Sorry for the delay on the response. If you want to train 2 different styles you'd need to do a training for each one. You can't train 2 styles on the same LoRA. But related to your issue, tell me how many images each style has and I'll provide you a config file for dev branch to see if it solves the things
No worries, I appreciate the help.
Subset 1 has 21 images, subset 2 has 69
[[subsets]]
caption_dropout_rate = 0.04
caption_extension = ".txt"
image_dir = "/path/to/dataset"
name = "style"
num_repeats = 4
random_crop = true
shuffle_caption = true
[train_mode]
train_mode = "lora"
[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "/path/to/model"
vae = "/path/to/VAE"
sdxl = true
mixed_precision = "bf16"
gradient_accumulation_steps = 4
seed = 23
max_token_length = 225
prior_loss_weight = 1.0
xformers = true
max_train_epochs = 20
[general_args.dataset_args]
resolution = 1024
batch_size = 2
[network_args.args]
network_dim = 24
network_alpha = 12.0
min_timestep = 0
max_timestep = 1000
network_train_unet_only = true
ip_noise_gamma = 0.05
[optimizer_args.args]
lr_scheduler = "cosine"
optimizer_type = "Came"
lr_scheduler_type = "LoraEasyCustomOptimizer.CustomOptimizers.Rex"
loss_type = "huber"
huber_schedule = "snr"
huber_c = 0.1
learning_rate = 7e-5
warmup_ratio = 0.15
scale_weight_norms = 9.0
max_grad_norm = 1.0
[saving_args.args]
save_precision = "fp16"
save_model_as = "safetensors"
tag_occurrence = true
save_every_n_epochs = 1
save_toml = true
output_dir = "/path/to/output_dir"
output_name = "style"
tag_file_location = "/path/to/tag_file_location"
save_toml_location = "/path/to/save_toml_location"
[noise_args.args]
noise_offset = 0.0357
multires_noise_iterations = 6
multires_noise_discount = 0.3
[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 64
[network_args.args.network_args]
conv_dim = 24
conv_alpha = 12.0
algo = "locon"
dora_wd = true
[optimizer_args.args.lr_scheduler_args]
min_lr = 1e-6
[optimizer_args.args.optimizer_args]
weight_decay = "0.1"
Try something like this, adjust the parameters to fit the needs of your GPU and dataset and train a LoRA per each style, not both at the same time.
This does actually work, however it's taking 4-5 minutes to complete a single step. Is this the performance I should be expecting using an RTX 4080 SUPER?
That is very likely a result of going beyond your vram, make sure you have gradient checkpointing enabled
That seems to have solved all my issues. Thanks for the help!
I recently got a new GPU and decided to get into LoRA training. However, I keep running into issues no matter what I try. The issue that seems to persist the most is
ModuleNotFoundError: No module named 'triton'
. #160 mentioned this was an issue with the toml loader, and that an escape character is being used somewhere, which is not the case for me. I have tried getting it to work both on the main branch and the dev branch.Here's the steps I took to get my error:
My setup consists of: CPU: AMD Ryzen 7 7800X3D GPU: RTX 4080 SUPER RAM: 64GB DDR5 6400MHz
Log output:
TOML Config:
It's worth mentioning that I managed to run it on the main branch (log outputs above are from dev branch), but it kept giving me
NaN found in latents, replacing with zeros
after every training step, resulting in the LoRA being completely gray. EDIT: I also don't get any of these errors when I use standalone kohya_ss (i am just too overwhelmed to use that software yet haha)