bmaltais / kohya_ss

Apache License 2.0
9.19k stars 1.19k forks source link

Train SDXL Lora with overtraining artifacts #2152

Open KorDum opened 5 months ago

KorDum commented 5 months ago

Hello! I have an strange issue with SDXL Lora's train.

I've tried both the newest version of Kohya_ss 23.0.15 (clean install without pip cache) and the oldest version 22.6.2 (clean install, too). The result is always the same. I have also tried training on different video cards: RTX 3070 with 8 Gb (4 dim, 512x512 resolution) and RTX 4090 with 24 Gb (4, 8, 32, 64, 128 dim, 1024x1024 resolution). There's no difference. I always get overtraining artifacts on the first epoch (and next), no matter what learning rate I set.

The data set consists of images I generated that are 1024x1024 pixels in size. 37 images * 15 steps per image = 555 steps per epoch

I don't know what to do anymore. I've tried so many things. Can you help me please? I do not observe such problems with SD1.5.

Example generated image with 10 steps, CFG 3 ComfyUI_temp_rqunk_00001_

{
  "LoRA_type": "Standard",
  "LyCORIS_preset": "full",
  "adaptive_noise_scale": 0,
  "additional_parameters": "",
  "block_alphas": "",
  "block_dims": "",
  "block_lr_zero_threshold": "",
  "bucket_no_upscale": true,
  "bucket_reso_steps": 64,
  "cache_latents": true,
  "cache_latents_to_disk": true,
  "caption_dropout_every_n_epochs": 0.0,
  "caption_dropout_rate": 0,
  "caption_extension": ".txt",
  "clip_skip": 1,
  "color_aug": false,
  "constrain": 0.0,
  "conv_alpha": 1,
  "conv_block_alphas": "",
  "conv_block_dims": "",
  "conv_dim": 1,
  "debiased_estimation_loss": false,
  "decompose_both": false,
  "dim_from_weights": false,
  "down_lr_weight": "",
  "enable_bucket": true,
  "epoch": 1,
  "factor": -1,
  "flip_aug": false,
  "fp8_base": false,
  "full_bf16": true,
  "full_fp16": false,
  "gpu_ids": "",
  "gradient_accumulation_steps": 1,
  "gradient_checkpointing": true,
  "keep_tokens": "0",
  "learning_rate": 0.0012,
  "logging_dir": "F:/train/data/log",
  "lora_network_weights": "",
  "lr_scheduler": "constant",
  "lr_scheduler_args": "",
  "lr_scheduler_num_cycles": "",
  "lr_scheduler_power": "",
  "lr_warmup": 0,
  "max_bucket_reso": 2048,
  "max_data_loader_n_workers": "0",
  "max_grad_norm": 1,
  "max_resolution": "512,512",
  "max_timestep": 1000,
  "max_token_length": "75",
  "max_train_epochs": "",
  "max_train_steps": "",
  "mem_eff_attn": false,
  "mid_lr_weight": "",
  "min_bucket_reso": 256,
  "min_snr_gamma": 0,
  "min_timestep": 0,
  "mixed_precision": "bf16",
  "model_list": "custom",
  "module_dropout": 0,
  "multi_gpu": false,
  "multires_noise_discount": 0,
  "multires_noise_iterations": 0,
  "network_alpha": 1,
  "network_dim": 4,
  "network_dropout": 0,
  "noise_offset": 0,
  "noise_offset_type": "Original",
  "num_cpu_threads_per_process": 2,
  "num_machines": 1,
  "num_processes": 1,
  "optimizer": "Adafactor",
  "optimizer_args": "scale_parameter=False relative_step=False warmup_init=False",
  "output_dir": "D:/train/data/model",
  "output_name": "test-xl-v2",
  "persistent_data_loader_workers": false,
  "pretrained_model_name_or_path": "F:\\StableDiffusion\\ComfyUI\\ComfyUI\\models\\checkpoints\\earthAnimixXLSemiflat_v15.safetensors",
  "prior_loss_weight": 1.0,
  "random_crop": false,
  "rank_dropout": 0,
  "rank_dropout_scale": false,
  "reg_data_dir": "",
  "rescaled": false,
  "resume": "",
  "sample_every_n_epochs": 0,
  "sample_every_n_steps": 0,
  "sample_prompts": "",
  "sample_sampler": "euler_a",
  "save_every_n_epochs": 1,
  "save_every_n_steps": 0,
  "save_last_n_steps": 0,
  "save_last_n_steps_state": 0,
  "save_model_as": "safetensors",
  "save_precision": "bf16",
  "save_state": false,
  "scale_v_pred_loss_like_noise_pred": false,
  "scale_weight_norms": 0,
  "sdxl": true,
  "sdxl_cache_text_encoder_outputs": false,
  "sdxl_no_half_vae": true,
  "seed": "",
  "shuffle_caption": false,
  "stop_text_encoder_training_pct": 0,
  "text_encoder_lr": 0.0012,
  "train_batch_size": 1,
  "train_data_dir": "F:/train/data/img",
  "train_norm": false,
  "train_on_input": true,
  "training_comment": "",
  "unet_lr": 0.0012,
  "unit": 1,
  "up_lr_weight": "",
  "use_cp": false,
  "use_scalar": false,
  "use_tucker": false,
  "use_wandb": false,
  "v2": false,
  "v_parameterization": false,
  "v_pred_like_loss": 0,
  "vae": "",
  "vae_batch_size": 0,
  "wandb_api_key": "",
  "weighted_captions": false,
  "xformers": "xformers"
}
Mosfett1975 commented 5 months ago

if you get such a result in the first epoch, then of course you should not continue. Here's the json I'm training on right now adafactor.json And yes, I don't remember anything like this howling on previous versions, but on the current one it was with both SD and SDXL a couple of times.

KorDum commented 5 months ago

@Mosfett1975 thanks for ansfer!

I thought the problem was the bitsandbytes package. However, 0.41.1 and 0.43.0+ don't affect anything. Probably some dependency has been updated, but I don't know how to determine it.

KorDum commented 5 months ago

@Mosfett1975 Thanks for the config file, it works great! I see a lot of discrepancies with mine - it remains to be seen which configuration is causing the problem. I'll report back here.

KorDum commented 5 months ago

Although, I just checked more closely and it seems the overtraining is still present. Kohya_ss 23.0.15 and 22.6.2

Example first epoch. Subsequent epoches already with obvious artifacts. Like 0.0004 for learning rate is a lot. image

Mosfett1975 commented 5 months ago

I've noticed now too - LORA is clearly not trained yet, but the background is starting to deteriorate. I've set up samples with a person in the city, the face doesn't look like it yet, but the paving stones are starting to disappear, the houses and the whole environment are blurred. I left in my set of photos only those with a simple white background, but it didn't help much

Mosfett1975 commented 5 months ago

I figured out the background, the whole point is what pictures are at the entrance - the background was filled in black and in the captions I prescribed that the background is black :)

KorDum commented 5 months ago

@Mosfett1975 On a fully updated kohya, including all pip dependencies? Is Triton enabled?

bmaltais commented 5 months ago

@Mosfett1975 On a fully updated koha, including all pip dependencies? Is Triton enabled?

Not on Windows

Mosfett1975 commented 5 months ago

all last updates installed, about Triton I'm not sure work it or no

KorDum commented 4 months ago

I get exactly the same results with overtraining on Linux as I do on Windows. Only now 2.5 times faster :) Still don't understand the reasoning behind it. Maybe the reason is that I'm not inputting real photos, but generated ones? I'm going to try to get someone's dataset with actual photos.

KorDum commented 4 months ago

Tried it on a third-party dataset, got exactly the same problems. I'm at a loss to guess what's going on. I have tried on two different computers with two different video cards and three different operating systems.

v0xie commented 4 months ago

Your config has max resolution specified as "max_resolution": "512,512", but you also have "sdxl": true.

You'd probably want to have the resolution set as 1024, 1024.

KorDum commented 4 months ago

@v0xie Unfortunately, my RTX 3070 doesn't allow me to specify this because 8GB VRAM. However, I also tried it on an RTX 4090 and 1024x1024. The outcome is the same.

KorDum commented 4 months ago

I tried latest dev-version of Kohya_ss along with all dependency updates. The result is the same :( I don't understand why this is happening.

Config for RTX 3070 8GB for 1024x1024 image size and 22 repeats for each image.

{
  "LoRA_type": "Standard",
  "LyCORIS_preset": "full",
  "adaptive_noise_scale": 0.006,
  "additional_parameters": "--network_train_unet_only",
  "block_alphas": "",
  "block_dims": "",
  "block_lr_zero_threshold": "",
  "bucket_no_upscale": true,
  "bucket_reso_steps": 32,
  "bypass_mode": false,
  "cache_latents": true,
  "cache_latents_to_disk": true,
  "caption_dropout_every_n_epochs": 0.0,
  "caption_dropout_rate": 0,
  "caption_extension": ".txt",
  "clip_skip": 2,
  "color_aug": false,
  "constrain": 0.0,
  "conv_alpha": 1,
  "conv_block_alphas": "",
  "conv_block_dims": "",
  "conv_dim": 1,
  "dataset_config": "",
  "debiased_estimation_loss": false,
  "decompose_both": false,
  "dim_from_weights": false,
  "dora_wd": false,
  "down_lr_weight": "",
  "enable_bucket": true,
  "epoch": 5,
  "extra_accelerate_launch_args": "",
  "factor": -1,
  "flip_aug": false,
  "fp8_base": false,
  "full_bf16": true,
  "full_fp16": false,
  "gpu_ids": "0",
  "gradient_accumulation_steps": 1,
  "gradient_checkpointing": true,
  "ip_noise_gamma": 0,
  "ip_noise_gamma_random_strength": false,
  "keep_tokens": "0",
  "learning_rate": 0.0004,
  "log_tracker_config": "",
  "log_tracker_name": "",
  "logging_dir": "D:/train/data/log",
  "lora_network_weights": "",
  "lr_scheduler": "constant_with_warmup",
  "lr_scheduler_args": "",
  "lr_scheduler_num_cycles": "",
  "lr_scheduler_power": "",
  "lr_warmup": 0,
  "main_process_port": 0,
  "masked_loss": false,
  "max_bucket_reso": 2048,
  "max_data_loader_n_workers": "1",
  "max_grad_norm": 1,
  "max_resolution": "512,512",
  "max_timestep": 1000,
  "max_token_length": "150",
  "max_train_epochs": "",
  "max_train_steps": "",
  "mem_eff_attn": false,
  "mid_lr_weight": "",
  "min_bucket_reso": 64,
  "min_snr_gamma": 5,
  "min_timestep": 0,
  "mixed_precision": "bf16",
  "model_list": "custom",
  "module_dropout": 0,
  "multi_gpu": false,
  "multires_noise_discount": 0.35,
  "multires_noise_iterations": 8,
  "network_alpha": 1,
  "network_dim": 4,
  "network_dropout": 0.2,
  "noise_offset": 0,
  "noise_offset_random_strength": false,
  "noise_offset_type": "Multires",
  "num_cpu_threads_per_process": 6,
  "num_machines": 1,
  "num_processes": 1,
  "optimizer": "Adafactor",
  "optimizer_args": "\"scale_parameter=False\", \"relative_step=False\", \"warmup_init=False\"",
  "output_dir": "D:/train/data/model",
  "output_name": "test-v4",
  "persistent_data_loader_workers": false,
  "pretrained_model_name_or_path": "F:/StableDiffusion/ComfyUI/ComfyUI/models/checkpoints/earthAnimixXLSemiflat_v15.safetensors",
  "prior_loss_weight": 1.0,
  "random_crop": false,
  "rank_dropout": 0,
  "rank_dropout_scale": false,
  "reg_data_dir": "",
  "rescaled": false,
  "resume": "",
  "sample_every_n_epochs": 1,
  "sample_every_n_steps": 0,
  "sample_prompts": "",
  "sample_sampler": "euler_a",
  "save_every_n_epochs": 1,
  "save_every_n_steps": 0,
  "save_last_n_steps": 0,
  "save_last_n_steps_state": 0,
  "save_model_as": "safetensors",
  "save_precision": "bf16",
  "save_state": false,
  "save_state_on_train_end": false,
  "scale_v_pred_loss_like_noise_pred": false,
  "scale_weight_norms": 1,
  "sdxl": true,
  "sdxl_cache_text_encoder_outputs": false,
  "sdxl_no_half_vae": true,
  "seed": "",
  "shuffle_caption": true,
  "stop_text_encoder_training_pct": 0,
  "text_encoder_lr": 0.0004,
  "train_batch_size": 1,
  "train_data_dir": "D:/train/data/img",
  "train_norm": false,
  "train_on_input": false,
  "training_comment": "",
  "unet_lr": 0.0004,
  "unit": 1,
  "up_lr_weight": "",
  "use_cp": false,
  "use_scalar": false,
  "use_tucker": false,
  "use_wandb": "",
  "v2": false,
  "v_parameterization": false,
  "v_pred_like_loss": 0,
  "vae": "",
  "vae_batch_size": 0,
  "wandb_api_key": "False",
  "wandb_run_name": "",
  "weighted_captions": false,
  "xformers": "xformers"
}
bmaltais commented 4 months ago

Hard to tell. But if this set of parameters worked with a previous release then keep training with it. Create a new config that train well under the latest release. If probably is related to some of the sd-scripts code update that now require modifications of training parameters.