Open timmbobb opened 4 months ago
I feel like I have to be doing something mindblowingly wrong or stupid, because people are saying that 2s/it is slow, and mine is going 25 times slower than that. All help would be appreciated!
It could be related to high VRAM usage.
Anecdotally, my RTX 4090 gets around 1.34s/it when training with similar settings as yours.
Your batch size is too high. Like the previous comment states you are using too much vram. You have to keep training under 24gb or it slows way down. For full model fine tunes you need a batch size of 1 and cacheing everything to fit it inside of a 24gb card.
Getting EXTREMELY slow training speed.
49 sec/it on 4090 seems completely unreasonable.
Here's my config: 14:16:06-593487 WARNING Here is the trainer command as a reference. It will not be executed:
14:16:06-594489 INFO C:\StableDiffusion\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/StableDiffusion/kohya_ss/sd-scripts/sdxl_train_network.py --config_file C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\model/config_lora-20240521-141606 .toml
14:16:06-595487 INFO Showing toml config file: C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\model/config_lora-20240521-141606 .toml
14:16:06-596487 INFO bucket_no_upscale = true bucket_reso_steps = 64 caption_extension = ".txt" clip_skip = 1 dynamo_backend = "no" enable_bucket = true epoch = 10 gradient_accumulation_steps = 1 huber_c = 0.1 huber_schedule = "snr" learning_rate = 0.0001 logging_dir = "C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\log" loss_type = "l2" lr_scheduler = "cosine_with_restarts" lr_scheduler_args = [] lr_scheduler_num_cycles = 3 lr_scheduler_power = 1 lr_warmup_steps = 388 max_bucket_reso = 2048 max_data_loader_n_workers = 0 max_grad_norm = 1 max_timestep = 1000 max_token_length = 75 max_train_steps = 7770 min_bucket_reso = 256 mixed_precision = "fp16" multires_noise_discount = 0.3 network_alpha = 16 network_args = [] network_dim = 32 network_module = "networks.lora" no_half_vae = true noise_offset_type = "Original" optimizer_args = [ "weight_decay=0.1", "betas=[0.9,0.99]",] optimizer_type = "AdamW8bit" output_dir = "C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\model" output_name = "last" pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0" prior_loss_weight = 1 resolution = "1024,1024" sample_every_n_epochs = 1 sample_prompts = "C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\model\prompt.txt" sample_sampler = "dpmsolver++" save_every_n_epochs = 1 save_model_as = "safetensors" save_precision = "fp16" shuffle_caption = true text_encoder_lr = 2e-5 train_batch_size = 3 train_data_dir = "C:/StableDiffusion/kohya_ss/dataset/formatted_training_images\img" unet_lr = 0.0001 xformers = true
14:16:06-599487 INFO end of toml config file: