Closed MOGRAINEREPORTS closed 3 weeks ago
Do you train on SD 1.5?
Do you train on SD 1.5?
sdxl = true
no, sdxl.
I've got adamW8bit full fp16 working, adamW full bf16 working but i can't figure out the adaptive ones, there must be something i'm doing wrong but This is the whole config so it's in there if I am at fault here lol
Prodigy doesn't perform good with fp16. For SDXL I recommend Came as optimizer and Rex (Rawr) as scheduler. The lr depends, I train characters in PonyDiffusion which is still SDXL and I use 1e-4 for unet lr and 1e-6 for TE lr and minimum lr. Also just fyi, in sdxl you want to train at 1024 resolution and if your GPU bf16 then use full bf16.
Prodigy doesn't perform good with fp16. For SDXL I recommend Came as optimizer and Rex (Rawr) as scheduler. The lr depends, I train characters in PonyDiffusion which is still SDXL and I use 1e-4 for unet lr and 1e-6 for TE lr and minimum lr. Also just fyi, in sdxl you want to train at 1024 resolution and if your GPU bf16 then use full bf16.
I guess I'm behind in the optimizer/scheduler game, to me, prodigy is still the fresh new thing off the grill that's poppin, i guess not.
Never heard about either of your suggestions but i will definitely try it... I'd love if you could share your whole config , ill try something out myself meanwhile, thanks!
Depends on what you want to train, but sure I'll share it on a few hours.
Depends on what you want to train, but sure I'll share it on a few hours.
Realistic likeness mostly, let me know! :D
This is what I use to train characters:
[[subsets]]
caption_extension = ".txt"
image_dir = "E:/Training_Loras/Leora/dataseter"
keep_tokens = 1
name = "cherry"
num_repeats = 1
shuffle_caption = true
[train_mode]
train_mode = "lora"
[general_args.args]
max_data_loader_n_workers = 1
persistent_data_loader_workers = true
pretrained_model_name_or_path = "C:/Users/User/Documents/SwarmUI/Models/Stable-Diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors"
vae = "C:/Users/User/Documents/SwarmUI/Models/VAE/sdxl_vae.safetensors"
sdxl = true
no_half_vae = true
full_bf16 = true
mixed_precision = "bf16"
gradient_checkpointing = true
seed = 69
max_token_length = 225
prior_loss_weight = 1.0
sdpa = true
max_train_epochs = 10
cache_latents = true
[general_args.dataset_args]
resolution = 1024
batch_size = 4
[network_args.args]
network_dim = 8
network_alpha = 4.0
min_timestep = 0
max_timestep = 1000
[optimizer_args.args]
lr_scheduler = "cosine"
optimizer_type = "Came"
lr_scheduler_type = "LoraEasyCustomOptimizer.RexAnnealingWarmRestarts.RexAnnealingWarmRestarts"
lr_scheduler_num_cycles = 1
loss_type = "l2"
learning_rate = 0.0001
warmup_ratio = 0.05
unet_lr = 0.0001
text_encoder_lr = 1e-6
max_grad_norm = 1.0
min_snr_gamma = 5
[saving_args.args]
output_dir = "E:/Training_Loras/Leora/model"
output_name = "Leora-JeloXL"
save_precision = "fp16"
save_model_as = "safetensors"
save_every_n_epochs = 1
save_toml = true
save_toml_location = "E:/Training_Loras/Leora/model"
[logging_args.args]
logging_dir = "E:/Training_Loras/Leora/tensorboard_logging"
log_prefix = "leora-"
log_with = "tensorboard"
[bucket_args.dataset_args]
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 4096
bucket_reso_steps = 64
[network_args.args.network_args]
[optimizer_args.args.lr_scheduler_args]
min_lr = 1e-6
gamma = 0.9
[optimizer_args.args.optimizer_args]
weight_decay = "0.04"
This is what I use to train characters:
[[subsets]] caption_extension = ".txt" image_dir = "E:/Training_Loras/Leora/dataseter" keep_tokens = 1 name = "cherry" num_repeats = 1 shuffle_caption = true [train_mode] train_mode = "lora" [general_args.args] max_data_loader_n_workers = 1 persistent_data_loader_workers = true pretrained_model_name_or_path = "C:/Users/User/Documents/SwarmUI/Models/Stable-Diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors" vae = "C:/Users/User/Documents/SwarmUI/Models/VAE/sdxl_vae.safetensors" sdxl = true no_half_vae = true full_bf16 = true mixed_precision = "bf16" gradient_checkpointing = true seed = 69 max_token_length = 225 prior_loss_weight = 1.0 sdpa = true max_train_epochs = 10 cache_latents = true [general_args.dataset_args] resolution = 1024 batch_size = 4 [network_args.args] network_dim = 8 network_alpha = 4.0 min_timestep = 0 max_timestep = 1000 [optimizer_args.args] lr_scheduler = "cosine" optimizer_type = "Came" lr_scheduler_type = "LoraEasyCustomOptimizer.RexAnnealingWarmRestarts.RexAnnealingWarmRestarts" lr_scheduler_num_cycles = 1 loss_type = "l2" learning_rate = 0.0001 warmup_ratio = 0.05 unet_lr = 0.0001 text_encoder_lr = 1e-6 max_grad_norm = 1.0 min_snr_gamma = 5 [saving_args.args] output_dir = "E:/Training_Loras/Leora/model" output_name = "Leora-JeloXL" save_precision = "fp16" save_model_as = "safetensors" save_every_n_epochs = 1 save_toml = true save_toml_location = "E:/Training_Loras/Leora/model" [logging_args.args] logging_dir = "E:/Training_Loras/Leora/tensorboard_logging" log_prefix = "leora-" log_with = "tensorboard" [bucket_args.dataset_args] enable_bucket = true min_bucket_reso = 256 max_bucket_reso = 4096 bucket_reso_steps = 64 [network_args.args.network_args] [optimizer_args.args.lr_scheduler_args] min_lr = 1e-6 gamma = 0.9 [optimizer_args.args.optimizer_args] weight_decay = "0.04"
amazing, thanks! its working but its really rough, for the model im using anyways, seems to overfit quick before training properly
I'ma tune it but it does work pretty good. PS: if i can ask you more questions about it on discord or something that'd be amazing, LMK
As a note though, i'd really like to be using prodigy or any good automated, set lr to 1 and forget it optimizer with this lora trainer if anyone has a solution....
Can't get any to work
Sure you can add me to discord and prodigy is not good with big datasets. I will close the issue now.
Here the settings. I have no error, it's just not training. 1 epoch or 20 epochs, the generation isn't affected one bit by the lora. If i send the config to regular koyah it does train, is it my error or is there something going on ?