bmaltais / kohya_ss

Apache License 2.0
9.51k stars 1.23k forks source link

I can't get training to start. #396

Closed Deejay85 closed 1 year ago

Deejay85 commented 1 year ago

I'm trying to train a new fetish using Lora, and while I've been watching some videos on how to set the basic training parameters, despite doing everything I'm supposed to, it's just not working. Also, while I did watch another video in case that made things easier, I didn't gain any new information on what I'm doing wrong. While I am new to all of this I do know that the files are supposed to go into the "100_name here" folder, as well as having a model and log folder all in the same directory.

If it helps any I've tried training with the latest version of Waifu Diffusion, and when that didn't work, I tried Lewd Diffusion and R34...both gave the same results. Not sure if it helps any, but there are multiple checkpoints in the stable-diffusion folder. and yes, I made sure to choose the file manually other than just setting the root directory.

Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Folder 100_Huge Balls: 2200 steps
max_train_steps = 1100
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt" --train_data_dir="S:/kohya_ss/Sample Images/Lora Folder/Image" --resolution=512,512 --output_dir="S:/kohya_ss/Sample Images/Lora Folder/Model" --logging_dir="S:/kohya_ss/Sample Images/Lora Folder/Log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="huge balls" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1100" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls contains 22 image files
2200 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (512, 512)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls"
    image_count: 22
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    is_reg: False
    class_tokens: Huge Balls
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 4400.53it/s]
prepare dataset
prepare accelerator
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 652, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 108, in train
    accelerator, unwrap_model = train_util.prepare_accelerator(args)
  File "S:\kohya_ss\library\train_util.py", line 1973, in prepare_accelerator
    accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision,
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 370, in __init__
    raise ValueError(err.format(mode="bf16", requirement="PyTorch >= 1.10 and a supported device."))
ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device.
Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt', '--train_data_dir=S:/kohya_ss/Sample Images/Lora Folder/Image', '--resolution=512,512', '--output_dir=S:/kohya_ss/Sample Images/Lora Folder/Model', '--logging_dir=S:/kohya_ss/Sample Images/Lora Folder/Log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=huge balls', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1100', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
bmaltais commented 1 year ago

did you complete the accelerate config step? Try running that again.

Deejay85 commented 1 year ago

Having some trouble here. Copied and pasted the following into Powershell with the following command:

PS S:\kohya_ss> S:\kohya_ss\venv\Scripts\accelerate-config.exe update
Sucessfully updated the configuration file at C:\Users\<REDACTED>/.cache\huggingface\accelerate\default_config.yaml.

The file's location however is at s:\kohyaa_ss, so why is it appearing there, I've no idea. Also, when I click on training, I get this message:

Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading config...
Folder 100_Huge Balls: 2200 steps
max_train_steps = 1100
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --v2 --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt" --train_data_dir="S:/kohya_ss/Sample Images/Lora Folder/Image" --resolution=512,512 --output_dir="S:/kohya_ss/Sample Images/Lora Folder/Model" --logging_dir="S:/kohya_ss/Sample Images/Lora Folder/Log" --save_model_as=safetensors --output_name="huge balls" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1100" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
prepare images.
found directory S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls contains 22 image files
2200 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (512, 512)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls"
    image_count: 22
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    is_reg: False
    class_tokens: Huge Balls
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 3666.93it/s]
prepare dataset
prepare accelerator
Traceback (most recent call last):
  File "S:\kohya_ss\train_db.py", line 364, in <module>
    train(args)
  File "S:\kohya_ss\train_db.py", line 75, in train
    accelerator, unwrap_model = train_util.prepare_accelerator(args)
  File "S:\kohya_ss\library\train_util.py", line 1984, in prepare_accelerator
    accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision,
  File "S:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 370, in __init__
    raise ValueError(err.format(mode="bf16", requirement="PyTorch >= 1.10 and a supported device."))
ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device.
Traceback (most recent call last):
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\kohya_ss\\venv\\Scripts\\python.exe', 'train_db.py', '--v2', '--pretrained_model_name_or_path=S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt', '--train_data_dir=S:/kohya_ss/Sample Images/Lora Folder/Image', '--resolution=512,512', '--output_dir=S:/kohya_ss/Sample Images/Lora Folder/Model', '--logging_dir=S:/kohya_ss/Sample Images/Lora Folder/Log', '--save_model_as=safetensors', '--output_name=huge balls', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1100', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
bmaltais commented 1 year ago

Change the bf16 to fp16 in the training parameters.

Deejay85 commented 1 year ago

I don't mind learning new things, but when IT'S HARDER THAN CHINESE ARITHMETIC it just gets tiring after a bit. I reinstalled the program to get the latest version, jumped through all the necessary hoops to get it back in working order, deleted and re-created the file default_config by running accelerate-config, and it still didn't work. The solution to my problem? Throwing my computer out of an open window. Just kidding of course. 😜

I ran the program, went to settings, Dreambooth Lora, training parameters, and made the change there...program finally started running. All this time...just that one simple value made it not working. My only question is why? Anyway, for now, I'm going to let it run overnight to see what happens.

EDIT

Celebrated too soon. It started at least, so that's a start. /lol Here's the error message:

Folder 100_Huge Balls: 2200 steps
max_train_steps = 2200
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt" --train_data_dir="C:\Users\<REDACTED>\kohya_ss\Sample Images\Lora Folder\Image" --resolution=512,512 --output_dir="C:/Users/<REDACTED>/kohya_ss/Sample Images/Lora Folder/Model" --logging_dir="C:\Users\<REDACTED>\kohya_ss\Sample Images\Lora Folder\Log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="huge balls" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="2200" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --max_data_loader_n_workers="1" --clip_skip=2 --xformers --use_8bit_adam
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Use DreamBooth method.
prepare train images.
found directory 100_Huge Balls contains 22 image files
2200 train images with repeating.
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 3384.98it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 22/22 [00:09<00:00,  2.20it/s]
import network module: networks.lora
create LoRA for Text Encoder: 138 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: Loading binary C:\Users\<REDACTED>\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 2200
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 2200
  num epochs / epoch数: 1
  batch size per device / バッチサイズ: 1
  total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 2200
steps:   0%|                                                                                  | 0/2200 [00:00<?, ?it/s]epoch 1/1
Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\<REDACTED>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\<REDACTED>\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\<REDACTED>\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "C:\Users\<REDACTED>\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "C:\Users\<REDACTED>\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\<REDACTED>\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--v2', '--pretrained_model_name_or_path=S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt', '--train_data_dir=C:\\Users\\<REDACTED>\\kohya_ss\\Sample Images\\Lora Folder\\Image', '--resolution=512,512', '--output_dir=C:/Users/<REDACTED>/kohya_ss/Sample Images/Lora Folder/Model', '--logging_dir=C:\\Users\\<REDACTED>\\kohya_ss\\Sample Images\\Lora Folder\\Log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=huge balls', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=2200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--xformers', '--use_8bit_adam']' returned non-zero exit status 1.

The most important line seems to be this one:

Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu

No idea if there is anything I can do about it or not. :/

bmaltais commented 1 year ago

@Deejay85 Don't despair. It look like your card dows not support bitsandbytes for AdamW8bit. Change the optimizer to AdamW instead. That should get you passed this. If NVidia supported the same feature on all it's cards it would make things way simpler.

Deejay85 commented 1 year ago

How do you know what my graphics card can support? Also discovered something interesting...I updated my drivers...but didn't reboot my computer. I couldn't even pull the Nvidia control panel up. Did a restart...and it didn't fix anything. Well...I tried at least. @_@

Anyway, did try AdamW, and when that didn't work, I tried the rest of them...none of them worked. To make it easier I will just include the instance where I ran AdamW. Also...github really needs a spoiler tag to compress long sections of code.

To create a public link, set `share=True` in `launch()`.
Loading config...
Folder 100_Huge Balls: 22 images found
Folder 100_Huge Balls: 2200 steps
max_train_steps = 1100
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt" --train_data_dir="S:/kohya_ss/Sample Images/Lora Folder/Image" --resolution=512,512 --output_dir="S:/kohya_ss/Sample Images/Lora Folder/Model" --logging_dir="S:/kohya_ss/Sample Images/Lora Folder/Log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="huge balls" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1100" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
NOTE: Redirects are currently not supported in Windows or MacOs.
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [www.007guard.com]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [www.007guard.com]:29500 (system error: 10049 - The requested address is not valid in its context.).
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls contains 44 image files
4400 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (512, 512)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls"
    image_count: 44
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    is_reg: False
    class_tokens: Huge Balls
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 4400.53it/s]
prepare dataset
prepare accelerator
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [www.007guard.com]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [www.007guard.com]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 699, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 119, in train
    accelerator, unwrap_model = train_util.prepare_accelerator(args)
  File "S:\kohya_ss\library\train_util.py", line 2498, in prepare_accelerator
    accelerator = Accelerator(
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 308, in __init__
    self.state = AcceleratorState(
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\state.py", line 150, in __init__
    torch.distributed.init_process_group(backend="nccl", **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 602, in init_process_group
    default_pg = _new_process_group_helper(
  File "S:\kohya_ss\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 727, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 37816) of binary: S:\kohya_ss\venv\Scripts\python.exe

Not sure if it's important, but here is the setting of my default_config.yaml

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: 'NO'
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: '0'
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
bmaltais commented 1 year ago

I wish I could help more but it is hard to troubleshoot when you don't have the issue yourself ;-(

Have you tried deleting the kohya_ss folder and then re-install everything from scratch to start clean?

Deejay85 commented 1 year ago

Yes...several times...ditto for deleting the default_config.yaml file. Willing to try again if you got any advice.

bmaltais commented 1 year ago

It is complaining about socket access. Might be some sort of 3rd party antivirus blocking the access? Do you run a non Microsoft AV on your PC?

Deejay85 commented 1 year ago

I tried a few things...using a modded .dll file, (https://rentry.org/2chAI_LoRA_Dreambooth_guide_english) disabling Avast for 10 minutes, and even downloading Stable Diffusion 2-1 from the quick pick menu as a test case (10 GB download...there's 1 1/2 hours I'll never get back again), all of which were total duds.

I'm running out of ideas here, and copying and pasting CUDA_LAUNCH_BLOCKING=1 into the extra arguments section of the training parameters isn't helping either...though it is generating an error message that says "got an unexpected keyword argument." 😩

bmaltais commented 1 year ago

There is something weird on your system. The socket error is for a domain that does not even exist: www.007guard.com

It is hard to tell what is the root cause... but I am inclined to say that it is external to kohya@ss and related to other software installed on your computer.

Deejay85 commented 1 year ago

Thanks for pointing that out...I would have never noticed it. Thankfully it's benign. Turns out Spybot S&D caught something, fixed it, and while that's the good news, the bad news is it messed it up when it fixed it, since it will return the first thing it finds for 127.0.0.1 ie. localhost. Thankfully it's super easy to fix...just edit hosts, and you're done. Here's the link below to the forum post for anyone else having this issue. I'll try to make the edit, and see if that resolves anything.

https://superuser.com/questions/706729/007guard-what-is-it-is-it-dangerous-and-can-it-be-removed

Ran it again...at least the faulty localhost issue has been resolved. :p Notice anything else I can try?

Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading config...
Folder 100_Huge Balls: 22 images found
Folder 100_Huge Balls: 2200 steps
max_train_steps = 1100
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization --pretrained_model_name_or_path="S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt" --train_data_dir="S:/kohya_ss/Sample Images/Lora Folder/Image" --resolution=512,512 --output_dir="S:/kohya_ss/Sample Images/Lora Folder/Model" --logging_dir="S:/kohya_ss/Sample Images/Lora Folder/Log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="huge balls" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1100" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls contains 44 image files
4400 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 2
  resolution: (512, 512)
  enable_bucket: False

  [Subset 0 of Dataset 0]
    image_dir: "S:\kohya_ss\Sample Images\Lora Folder\Image\100_Huge Balls"
    image_count: 44
    num_repeats: 100
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    is_reg: False
    class_tokens: Huge Balls
    caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 4400.53it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 22/22 [00:07<00:00,  2.83it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 128, alpha: 128.0
create LoRA for Text Encoder: 138 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 4400
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 1100
  num epochs / epoch数: 1
  batch size per device / バッチサイズ: 2
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 1100
steps:   0%|                                                                                  | 0/1100 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
  File "S:\kohya_ss\train_network.py", line 699, in <module>
    train(args)
  File "S:\kohya_ss\train_network.py", line 538, in train
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "S:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
    return func(*args, **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward
    sample, res_samples = downsample_block(
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 612, in forward
    hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward
    hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "S:\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 484, in forward
    hidden_states = self.attn1(norm_hidden_states) + hidden_states
  File "S:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "S:\kohya_ss\library\train_util.py", line 1700, in forward_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)  # 最適なのを選んでくれる
  File "S:\kohya_ss\venv\lib\site-packages\xformers\ops.py", line 865, in memory_efficient_attention
    return op.apply(query, key, value, attn_bias, p).reshape(output_shape)
  File "S:\kohya_ss\venv\lib\site-packages\xformers\ops.py", line 319, in forward
    out, lse = cls.FORWARD_OPERATOR(
  File "S:\kohya_ss\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
steps:   0%|                                                                                  | 0/1100 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "S:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "S:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=S:/WaifuDiffusion/models/Stable-diffusion/Waifu Diffusion.ckpt', '--train_data_dir=S:/kohya_ss/Sample Images/Lora Folder/Image', '--resolution=512,512', '--output_dir=S:/kohya_ss/Sample Images/Lora Folder/Model', '--logging_dir=S:/kohya_ss/Sample Images/Lora Folder/Log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=huge balls', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1100', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
Deejay85 commented 1 year ago

Okay...after a LOT of trial and error that surprisingly didn't involve learning Chinese arithmetic, I finally managed to figure out what was wrong by brute-forcing the situation (trying everything I could think of to see what would stick). Turns out all I had to do was disable xformers. What? Clearly, something isn't right here.

I did install the recent Cuda package for Windows 10, got an error, and had to uninstall both drivers for the 1060 GTX and the Tesla M40, reinstall them, and then installed the Cuda package...I got no idea what I'm doing wrong. Also, somehow, I was able to get Lora to run on Automatic1111 for a bit. I'm going to look into trying to troubleshoot xformers right now, because I really don't know what else to do at this point.