Open KorDum opened 5 months ago
if you get such a result in the first epoch, then of course you should not continue. Here's the json I'm training on right now adafactor.json And yes, I don't remember anything like this howling on previous versions, but on the current one it was with both SD and SDXL a couple of times.
@Mosfett1975 thanks for ansfer!
I thought the problem was the bitsandbytes package. However, 0.41.1 and 0.43.0+ don't affect anything. Probably some dependency has been updated, but I don't know how to determine it.
@Mosfett1975 Thanks for the config file, it works great! I see a lot of discrepancies with mine - it remains to be seen which configuration is causing the problem. I'll report back here.
Although, I just checked more closely and it seems the overtraining is still present. Kohya_ss 23.0.15 and 22.6.2
Example first epoch. Subsequent epoches already with obvious artifacts. Like 0.0004 for learning rate is a lot.
I've noticed now too - LORA is clearly not trained yet, but the background is starting to deteriorate. I've set up samples with a person in the city, the face doesn't look like it yet, but the paving stones are starting to disappear, the houses and the whole environment are blurred. I left in my set of photos only those with a simple white background, but it didn't help much
I figured out the background, the whole point is what pictures are at the entrance - the background was filled in black and in the captions I prescribed that the background is black :)
@Mosfett1975 On a fully updated kohya, including all pip dependencies? Is Triton enabled?
@Mosfett1975 On a fully updated koha, including all pip dependencies? Is Triton enabled?
Not on Windows
all last updates installed, about Triton I'm not sure work it or no
I get exactly the same results with overtraining on Linux as I do on Windows. Only now 2.5 times faster :) Still don't understand the reasoning behind it. Maybe the reason is that I'm not inputting real photos, but generated ones? I'm going to try to get someone's dataset with actual photos.
Tried it on a third-party dataset, got exactly the same problems. I'm at a loss to guess what's going on. I have tried on two different computers with two different video cards and three different operating systems.
Your config has max resolution specified as "max_resolution": "512,512"
, but you also have "sdxl": true
.
You'd probably want to have the resolution set as 1024, 1024.
@v0xie Unfortunately, my RTX 3070 doesn't allow me to specify this because 8GB VRAM. However, I also tried it on an RTX 4090 and 1024x1024. The outcome is the same.
I tried latest dev-version of Kohya_ss along with all dependency updates. The result is the same :( I don't understand why this is happening.
Config for RTX 3070 8GB for 1024x1024 image size and 22 repeats for each image.
{
"LoRA_type": "Standard",
"LyCORIS_preset": "full",
"adaptive_noise_scale": 0.006,
"additional_parameters": "--network_train_unet_only",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 32,
"bypass_mode": false,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": ".txt",
"clip_skip": 2,
"color_aug": false,
"constrain": 0.0,
"conv_alpha": 1,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 1,
"dataset_config": "",
"debiased_estimation_loss": false,
"decompose_both": false,
"dim_from_weights": false,
"dora_wd": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 5,
"extra_accelerate_launch_args": "",
"factor": -1,
"flip_aug": false,
"fp8_base": false,
"full_bf16": true,
"full_fp16": false,
"gpu_ids": "0",
"gradient_accumulation_steps": 1,
"gradient_checkpointing": true,
"ip_noise_gamma": 0,
"ip_noise_gamma_random_strength": false,
"keep_tokens": "0",
"learning_rate": 0.0004,
"log_tracker_config": "",
"log_tracker_name": "",
"logging_dir": "D:/train/data/log",
"lora_network_weights": "",
"lr_scheduler": "constant_with_warmup",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"main_process_port": 0,
"masked_loss": false,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "1",
"max_grad_norm": 1,
"max_resolution": "512,512",
"max_timestep": 1000,
"max_token_length": "150",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": false,
"mid_lr_weight": "",
"min_bucket_reso": 64,
"min_snr_gamma": 5,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multi_gpu": false,
"multires_noise_discount": 0.35,
"multires_noise_iterations": 8,
"network_alpha": 1,
"network_dim": 4,
"network_dropout": 0.2,
"noise_offset": 0,
"noise_offset_random_strength": false,
"noise_offset_type": "Multires",
"num_cpu_threads_per_process": 6,
"num_machines": 1,
"num_processes": 1,
"optimizer": "Adafactor",
"optimizer_args": "\"scale_parameter=False\", \"relative_step=False\", \"warmup_init=False\"",
"output_dir": "D:/train/data/model",
"output_name": "test-v4",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "F:/StableDiffusion/ComfyUI/ComfyUI/models/checkpoints/earthAnimixXLSemiflat_v15.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"rank_dropout_scale": false,
"reg_data_dir": "",
"rescaled": false,
"resume": "",
"sample_every_n_epochs": 1,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": false,
"save_state_on_train_end": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 1,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "",
"shuffle_caption": true,
"stop_text_encoder_training_pct": 0,
"text_encoder_lr": 0.0004,
"train_batch_size": 1,
"train_data_dir": "D:/train/data/img",
"train_norm": false,
"train_on_input": false,
"training_comment": "",
"unet_lr": 0.0004,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_scalar": false,
"use_tucker": false,
"use_wandb": "",
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae": "",
"vae_batch_size": 0,
"wandb_api_key": "False",
"wandb_run_name": "",
"weighted_captions": false,
"xformers": "xformers"
}
Hard to tell. But if this set of parameters worked with a previous release then keep training with it. Create a new config that train well under the latest release. If probably is related to some of the sd-scripts code update that now require modifications of training parameters.
Hello! I have an strange issue with SDXL Lora's train.
I've tried both the newest version of Kohya_ss 23.0.15 (clean install without pip cache) and the oldest version 22.6.2 (clean install, too). The result is always the same. I have also tried training on different video cards: RTX 3070 with 8 Gb (4 dim, 512x512 resolution) and RTX 4090 with 24 Gb (4, 8, 32, 64, 128 dim, 1024x1024 resolution). There's no difference. I always get overtraining artifacts on the first epoch (and next), no matter what learning rate I set.
The data set consists of images I generated that are 1024x1024 pixels in size. 37 images * 15 steps per image = 555 steps per epoch
I don't know what to do anymore. I've tried so many things. Can you help me please? I do not observe such problems with SD1.5.
Example generated image with 10 steps, CFG 3