train loss=nan, my video card is 1660ti

xueqing0622 commented 1 year ago

hi, everyone, my train loss=nan, my video card is 1660ti. hope someone can help me, many thanks! my train setting is: { "pretrained_model_name_or_path": "D:/sdXK/models/Stable-diffusion/v1-5-pruned.ckpt", "v2": false, "v_parameterization": false, "logging_dir": "G:\loraTrain\xqQinlan", "train_data_dir": "G:\loraTrain\xqQinlan", "reg_data_dir": "", "output_dir": "D:\sdXK\extensions\sd-webui-additional-networks\models\lora", "max_resolution": "512,512", "learning_rate": "0.0001", "lr_scheduler": "constant", "lr_warmup": "0", "train_batch_size": 2, "epoch": "1", "save_every_n_epochs": "1", "mixed_precision": "bf16", "save_precision": "bf16", "seed": "1234", "num_cpu_threads_per_process": 8, "cache_latents": true, "caption_extension": ".txt", "enable_bucket": true, "gradient_checkpointing": false, "full_fp16": false, "no_token_padding": false, "stop_text_encoder_training": 0, "use_8bit_adam": true, "xformers": true, "save_model_as": "safetensors", "shuffle_caption": false, "save_state": false, "resume": "", "prior_loss_weight": 1.0, "text_encoder_lr": "5e-5", "unet_lr": "0.0001", "network_dim": 32, "lora_network_weights": "", "color_aug": false, "flip_aug": false, "clip_skip": 2, "gradient_accumulation_steps": 1.0, "mem_eff_attn": false, "output_name": "xqQinlan", "model_list": "runwayml/stable-diffusion-v1-5", "max_token_length": "75", "max_train_epochs": "", "max_data_loader_n_workers": "1", "network_alpha": 32, "training_comment": "", "keep_tokens": "0", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "persistent_data_loader_workers": false, "bucket_no_upscale": true, "random_crop": false, "bucket_reso_steps": 64.0, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "optimizer": "AdamW" }

ThisIsCyreX commented 1 year ago

You need to change mixed precision to no. But this will increase VRAM usage. So most likely you need to set the following too:

"train_batch_size": 1,
"mixed_precision": "no",
"save_precision": "fp16",
"enable_bucket": false,
"gradient_checkpointing": true,
"use_8bit_adam": true,
"xformers": true,
"mem_eff_attn": true,

Those settings worked for me (last week). I didn't try it on the latest commits with D-Adaptation.

xueqing0622 commented 1 year ago

Many Thanks to help me ,ThisIsCyreX Still the loss=nan, change the setting as you say, but not work, maybe my video card's problem:1660ti { "pretrained_model_name_or_path": "D:/sdXK/models/Stable-diffusion/v1-5-pruned.ckpt", "v2": false, "v_parameterization": false, "logging_dir": "G:\loraTrain\log", "train_data_dir": "G:\loraTrain\xqQinlan", "reg_data_dir": "", "output_dir": "D:\sdXK\extensions\sd-webui-additional-networks\models\lora", "max_resolution": "512,512", "learning_rate": "0.0001", "lr_scheduler": "cosine", "lr_warmup": "0", "train_batch_size": 1, "epoch": "1", "save_every_n_epochs": "1", "mixed_precision": "no", "save_precision": "fp16", "seed": "1234", "num_cpu_threads_per_process": 8, "cache_latents": true, "caption_extension": ".txt", "enable_bucket": false, "gradient_checkpointing": true, "full_fp16": false, "no_token_padding": false, "stop_text_encoder_training": 0, "use_8bit_adam": true, "xformers": true, "save_model_as": "safetensors", "shuffle_caption": false, "save_state": false, "resume": "", "prior_loss_weight": 1.0, "text_encoder_lr": "5e-5", "unet_lr": "0.0001", "network_dim": 32, "lora_network_weights": "", "color_aug": false, "flip_aug": false, "clip_skip": 2, "gradient_accumulation_steps": 1.0, "mem_eff_attn": true, "output_name": "xqQinlan", "model_list": "runwayml/stable-diffusion-v1-5", "max_token_length": "75", "max_train_epochs": "", "max_data_loader_n_workers": "1", "network_alpha": 64, "training_comment": "", "keep_tokens": "0", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "persistent_data_loader_workers": false, "bucket_no_upscale": true, "random_crop": false, "bucket_reso_steps": 64.0, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "optimizer": "AdamW" }

ThisIsCyreX commented 1 year ago

I just tried again with the latest version, and you're correct: I get now loss=nan with my 1660ti, too 😞 So something changed from last week to now that broke it.

This version still works with my 1660ti: https://github.com/bmaltais/kohya_ss/tree/261b6790ee1e92d6d84220ef2d990542acc3b8aa Replace the files and try again

xueqing0622 commented 1 year ago

yeah，it can work now. ThisIsCyreX，Thank you for your great help. I finally can train Lora

5yes commented 1 year ago

I use 1660s and changing versions didn't fix the problem

peka2 commented 1 year ago

My 1660Ti also trains successfully but is not learned! Has anyone had success training LoRA and generating images with the 1660Ti?

Is there a problem with xformers 0.0.14? https://github.com/kohya-ss/sd-scripts/issues/85 But it was the same with xformers=false

peka2 commented 1 year ago

You need to change mixed precision to no.

I tried it. And it worked! It seems it was important to make this place "no" xformers is available.

reynol1 commented 1 year ago

Thanks for help guys. I'm getting this error. None of the inputs have requires_grad=True. Gradients will be None

bmaltais / kohya_ss

train loss=nan, my video card is 1660ti #215