LoRA Training keeps finishing far too soon

VGF64 commented 9 months ago

Hi I'm a complete newbie trying to make LoRA with the custom nodes. The training keeps finishing way too soon and I have no idea why that is. I did the caption nodes and the Tagger already, but the training doesn't really start?

Here's the command prompt:

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 3274.53it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (320, 768), count: 5 bucket 1: resolution (384, 640), count: 20 bucket 2: resolution (448, 576), count: 30 bucket 3: resolution (512, 512), count: 5 bucket 4: resolution (640, 384), count: 25 bucket 5: resolution (704, 320), count: 5 mean ar error (without repeats): 0.08370723268377162 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: G:/models/Stable-diffusion\realisticVisionV60B1_v20Novae.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net Traceback (most recent call last): File "C:\Users\jrnsa\OneDrive\Documents\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "C:\Users\jrnsa\OneDrive\Documents\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 236, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 261, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 257, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 257, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 257, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 254, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\jrnsa\AppData\Local\Programs\Python\Python310\python.exe', 'C:/Users/jrnsa/OneDrive/Documents/ComfyUI/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=G:/models/Stable-diffusion\realisticVisionV60B1_v20Novae.safetensors', '--train_data_dir=C:/Users/jrnsa/OneDrive/Documents/Datasets/Jill Wagner LoRA', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=JillWagner0001', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=JillWagner0001', '--train_batch_size=1', '--save_every_n_epochs=50', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=4', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=1', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 10.00 seconds

Unsure if this issue is python related or not. My ComfyUI doesn't have venv.

LarryJane491 commented 9 months ago

"ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU"

This makes me think Torch isn't installed properly. You don't have a venv, but it doesn't seem to be the portable version, am I right? Try the "Pytorch fix" then: go to Pytorch.org, find the code line to install torch with CUDA. Install it in the proper folder, it should then use the GPU.

VGF64 commented 9 months ago

"ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU"

This makes me think Torch isn't installed properly. You don't have a venv, but it doesn't seem to be the portable version, am I right? Try the "Pytorch fix" then: go to Pytorch.org, find the code line to install torch with CUDA. Install it in the proper folder, it should then use the GPU.

Hey apologies, I closed this issue because I solved the issue on my own. I'm having another trouble however, and that I'm getting this error:

Screenshot (41)

I think this has something to do with torchvision? Do I have to reinstall an old version? Can't figure this one out.

EDIT: I got it all FIXED. :) Sorry about all of these issues thrown at you.

LarryJane491 commented 8 months ago

Oops, it's my bad, I didn't see it had been closed x). Glad that you worked it out!

LarryJane491 / Image-Captioning-in-ComfyUI

LoRA Training keeps finishing far too soon #2