Closed xueqing0622 closed 1 year ago
You need to change mixed precision
to no
. But this will increase VRAM usage.
So most likely you need to set the following too:
"train_batch_size": 1,
"mixed_precision": "no",
"save_precision": "fp16",
"enable_bucket": false,
"gradient_checkpointing": true,
"use_8bit_adam": true,
"xformers": true,
"mem_eff_attn": true,
Those settings worked for me (last week). I didn't try it on the latest commits with D-Adaptation.
Many Thanks to help me ,ThisIsCyreX Still the loss=nan, change the setting as you say, but not work, maybe my video card's problem:1660ti { "pretrained_model_name_or_path": "D:/sdXK/models/Stable-diffusion/v1-5-pruned.ckpt", "v2": false, "v_parameterization": false, "logging_dir": "G:\loraTrain\log", "train_data_dir": "G:\loraTrain\xqQinlan", "reg_data_dir": "", "output_dir": "D:\sdXK\extensions\sd-webui-additional-networks\models\lora", "max_resolution": "512,512", "learning_rate": "0.0001", "lr_scheduler": "cosine", "lr_warmup": "0", "train_batch_size": 1, "epoch": "1", "save_every_n_epochs": "1", "mixed_precision": "no", "save_precision": "fp16", "seed": "1234", "num_cpu_threads_per_process": 8, "cache_latents": true, "caption_extension": ".txt", "enable_bucket": false, "gradient_checkpointing": true, "full_fp16": false, "no_token_padding": false, "stop_text_encoder_training": 0, "use_8bit_adam": true, "xformers": true, "save_model_as": "safetensors", "shuffle_caption": false, "save_state": false, "resume": "", "prior_loss_weight": 1.0, "text_encoder_lr": "5e-5", "unet_lr": "0.0001", "network_dim": 32, "lora_network_weights": "", "color_aug": false, "flip_aug": false, "clip_skip": 2, "gradient_accumulation_steps": 1.0, "mem_eff_attn": true, "output_name": "xqQinlan", "model_list": "runwayml/stable-diffusion-v1-5", "max_token_length": "75", "max_train_epochs": "", "max_data_loader_n_workers": "1", "network_alpha": 64, "training_comment": "", "keep_tokens": "0", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "persistent_data_loader_workers": false, "bucket_no_upscale": true, "random_crop": false, "bucket_reso_steps": 64.0, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "optimizer": "AdamW" }
I just tried again with the latest version, and you're correct:
I get now loss=nan
with my 1660ti, too 😞
So something changed from last week to now that broke it.
This version still works with my 1660ti: https://github.com/bmaltais/kohya_ss/tree/261b6790ee1e92d6d84220ef2d990542acc3b8aa Replace the files and try again
yeah,it can work now. ThisIsCyreX,Thank you for your great help. I finally can train Lora
I use 1660s and changing versions didn't fix the problem
My 1660Ti also trains successfully but is not learned! Has anyone had success training LoRA and generating images with the 1660Ti?
Is there a problem with xformers 0.0.14? https://github.com/kohya-ss/sd-scripts/issues/85 But it was the same with xformers=false
You need to change mixed precision to no.
I tried it. And it worked! It seems it was important to make this place "no" xformers is available.
Thanks for help guys. I'm getting this error. None of the inputs have requires_grad=True. Gradients will be None
hi, everyone, my train loss=nan, my video card is 1660ti. hope someone can help me, many thanks! my train setting is: { "pretrained_model_name_or_path": "D:/sdXK/models/Stable-diffusion/v1-5-pruned.ckpt", "v2": false, "v_parameterization": false, "logging_dir": "G:\loraTrain\xqQinlan", "train_data_dir": "G:\loraTrain\xqQinlan", "reg_data_dir": "", "output_dir": "D:\sdXK\extensions\sd-webui-additional-networks\models\lora", "max_resolution": "512,512", "learning_rate": "0.0001", "lr_scheduler": "constant", "lr_warmup": "0", "train_batch_size": 2, "epoch": "1", "save_every_n_epochs": "1", "mixed_precision": "bf16", "save_precision": "bf16", "seed": "1234", "num_cpu_threads_per_process": 8, "cache_latents": true, "caption_extension": ".txt", "enable_bucket": true, "gradient_checkpointing": false, "full_fp16": false, "no_token_padding": false, "stop_text_encoder_training": 0, "use_8bit_adam": true, "xformers": true, "save_model_as": "safetensors", "shuffle_caption": false, "save_state": false, "resume": "", "prior_loss_weight": 1.0, "text_encoder_lr": "5e-5", "unet_lr": "0.0001", "network_dim": 32, "lora_network_weights": "", "color_aug": false, "flip_aug": false, "clip_skip": 2, "gradient_accumulation_steps": 1.0, "mem_eff_attn": false, "output_name": "xqQinlan", "model_list": "runwayml/stable-diffusion-v1-5", "max_token_length": "75", "max_train_epochs": "", "max_data_loader_n_workers": "1", "network_alpha": 32, "training_comment": "", "keep_tokens": "0", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "persistent_data_loader_workers": false, "bucket_no_upscale": true, "random_crop": false, "bucket_reso_steps": 64.0, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "optimizer": "AdamW" }