To create a public link, set share=True in launch().
02:09:20-964813 INFO SD v2 v_parameterization detected. Setting --v2 parameter and --v_parameterization
02:09:57-177145 INFO Loading config...
02:09:59-898536 INFO SD v2 v_parameterization detected. Setting --v2 parameter and --v_parameterization
02:10:45-025677 INFO Start training LoRA Standard ...
02:10:45-027644 INFO Folder 100_aleng: 141 images found
02:10:45-029639 INFO Folder 100_aleng: 14100 steps
02:10:45-030674 INFO Total steps: 14100
02:10:45-031661 INFO Train batch size: 2
02:10:45-032660 INFO Gradient accumulation steps: 1.0
02:10:45-033635 INFO Epoch: 1
02:10:45-034653 INFO Regulatization factor: 1
02:10:45-035650 INFO max_train_steps (14100 / 2 / 1.0 1 1) = 7050
02:10:45-036620 INFO stop_text_encoder_training = 0
02:10:45-038644 INFO lr_warmup_steps = 0
02:10:45-039641 INFO accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" --train_data_dir="F:/Stable
Diffusion/aleng/image" --resolution=768,768 --output_dir="F:/Stable Diffusion/aleng/model"
--logging_dir="F:/Stable Diffusion/aleng/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128
--output_name="aleng" --lr_scheduler_num_cycles="1" --learning_rate="0.0001"
--lr_scheduler="constant" --train_batch_size="2" --max_train_steps="7050"
--save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234"
--caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit"
--max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn
--gradient_checkpointing --xformers --bucket_no_upscale
[02:10:53] WARNING NOTE: Redirects are currently not supported in Windows or MacOs. redirects.py:27[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。).
[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。).
v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません
prepare tokenizer
Using DreamBooth method.
prepare images.
found directory F:\Stable Diffusion\aleng\image\100_aleng contains 141 image files
No caption file found for 42 images. Training will continue without captions for these images. If class token exists, it will be used. / 42枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-39-46.png
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-11.png
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-26.png
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-35.png
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-41-23.png
F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-41-47.png... and 37 more
14100 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (768, 768)
enable_bucket: False
To create a public link, set
share=True
inlaunch()
. 02:09:20-964813 INFO SD v2 v_parameterization detected. Setting --v2 parameter and --v_parameterization 02:09:57-177145 INFO Loading config... 02:09:59-898536 INFO SD v2 v_parameterization detected. Setting --v2 parameter and --v_parameterization 02:10:45-025677 INFO Start training LoRA Standard ... 02:10:45-027644 INFO Folder 100_aleng: 141 images found 02:10:45-029639 INFO Folder 100_aleng: 14100 steps 02:10:45-030674 INFO Total steps: 14100 02:10:45-031661 INFO Train batch size: 2 02:10:45-032660 INFO Gradient accumulation steps: 1.0 02:10:45-033635 INFO Epoch: 1 02:10:45-034653 INFO Regulatization factor: 1 02:10:45-035650 INFO max_train_steps (14100 / 2 / 1.0 1 1) = 7050 02:10:45-036620 INFO stop_text_encoder_training = 0 02:10:45-038644 INFO lr_warmup_steps = 0 02:10:45-039641 INFO accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --v2 --v_parameterization --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" --train_data_dir="F:/Stable Diffusion/aleng/image" --resolution=768,768 --output_dir="F:/Stable Diffusion/aleng/model" --logging_dir="F:/Stable Diffusion/aleng/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128 --output_name="aleng" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="7050" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --gradient_checkpointing --xformers --bucket_no_upscale [02:10:53] WARNING NOTE: Redirects are currently not supported in Windows or MacOs. redirects.py:27[W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). v2 with clip_skip will be unexpected / v2でclip_skipを使用することは想定されていません prepare tokenizer Using DreamBooth method. prepare images. found directory F:\Stable Diffusion\aleng\image\100_aleng contains 141 image files No caption file found for 42 images. Training will continue without captions for these images. If class token exists, it will be used. / 42枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-39-46.png F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-11.png F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-26.png F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-40-35.png F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-41-23.png F:\Stable Diffusion\aleng\image\100_aleng\Snipaste_2023-06-03_17-41-47.png... and 37 more 14100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (768, 768) enable_bucket: False[Subset 0 of Dataset 0] image_dir: "F:\Stable Diffusion\aleng\image\100_aleng" image_count: 141 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: aleng caption_extension: .txt
[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████| 141/141 [00:00<00:00, 9024.69it/s] prepare dataset preparing accelerator F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: │
│ │
│ 811 │ args = parser.parse_args() │
│ 812 │ args = train_util.read_config_from_file(args, parser) │
│ 813 │ │
│ ❱ 814 │ train(args) │
│ 815 │
│ │
│ F:\Stable Diffusion\kohya_ss\train_network.py:139 in train │
│ │
│ 136 │ │
│ 137 │ # acceleratorを準備する │
│ 138 │ print("preparing accelerator") │
│ ❱ 139 │ accelerator, unwrap_model = train_util.prepare_accelerator(args) │
│ 140 │ is_main_process = accelerator.is_main_process │
│ 141 │ │
│ 142 │ # mixed precisionに対応した型を用意しておき適宜castする │
│ │
│ F:\Stable Diffusion\kohya_ss\library\train_util.py:2975 in prepare_accelerator │
│ │
│ 2972 │ │ │ if args.wandb_api_key is not None: │
│ 2973 │ │ │ │ wandb.login(key=args.wandb_api_key) │
│ 2974 │ │
│ ❱ 2975 │ accelerator = Accelerator( │
│ 2976 │ │ gradient_accumulation_steps=args.gradient_accumulation_steps, │
│ 2977 │ │ mixed_precision=args.mixed_precision, │
│ 2978 │ │ log_with=log_with, │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:346 in init │
│ │
│ 343 │ │ │ │ │ │ self.fp8_recipe_handler = handler │
│ 344 │ │ │
│ 345 │ │ kwargs = self.init_handler.to_kwargs() if self.init_handler is not None else {} │
│ ❱ 346 │ │ self.state = AcceleratorState( │
│ 347 │ │ │ mixed_precision=mixed_precision, │
│ 348 │ │ │ cpu=cpu, │
│ 349 │ │ │ dynamo_plugin=dynamo_plugin, │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\state.py:540 in init │
│ │
│ 537 │ │ if parse_flag_from_env("ACCELERATE_USE_CPU"): │
│ 538 │ │ │ cpu = True │
│ 539 │ │ if PartialState._shared_state == {}: │
│ ❱ 540 │ │ │ PartialState(cpu, kwargs) │
│ 541 │ │ self.dict.update(PartialState._shared_state) │
│ 542 │ │ self._check_initialized(mixed_precision, cpu) │
│ 543 │ │ if not self.initialized: │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\state.py:129 in init │
│ │
│ 126 │ │ │ elif int(os.environ.get("LOCAL_RANK", -1)) != -1 and not cpu: │
│ 127 │ │ │ │ self.distributed_type = DistributedType.MULTI_GPU │
│ 128 │ │ │ │ if not torch.distributed.is_initialized(): │
│ ❱ 129 │ │ │ │ │ torch.distributed.init_process_group(backend="nccl", kwargs) │
│ 130 │ │ │ │ │ self.backend = "nccl" │
│ 131 │ │ │ │ self.num_processes = torch.distributed.get_world_size() │
│ 132 │ │ │ │ self.process_index = torch.distributed.get_rank() │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\torch\distributed\distributed_c10d.py:602 in │
│ init_process_group │
│ │
│ 599 │ │ │ # different systems (e.g. RPC) in case the store is multi-tenant. │
│ 600 │ │ │ store = PrefixStore("default_pg", store) │
│ 601 │ │ │
│ ❱ 602 │ │ default_pg = _new_process_group_helper( │
│ 603 │ │ │ world_size, │
│ 604 │ │ │ rank, │
│ 605 │ │ │ [], │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\torch\distributed\distributed_c10d.py:727 in │
│ _new_process_group_helper │
│ │
│ 724 │ │ │ _pg_names[pg] = group_name │
│ 725 │ │ elif backend == Backend.NCCL: │
│ 726 │ │ │ if not is_nccl_available(): │
│ ❱ 727 │ │ │ │ raise RuntimeError("Distributed package doesn't have NCCL " "built in") │
│ 728 │ │ │ if pg_options is not None: │
│ 729 │ │ │ │ assert isinstance( │
│ 730 │ │ │ │ │ pg_options, ProcessGroupNCCL.Options │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Distributed package doesn't have NCCL built in
[02:11:04] ERROR failed (exitcode: 1) local_rank: 0 (pid: 5072) of binary: F:\Stable api.py:671 Diffusion\kohya_ss\venv\Scripts\python.exe
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\kira\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in _run_module_as_main │
│ │
│ 193 │ main_globals = sys.modules["main"].dict │
│ 194 │ if alter_argv: │
│ 195 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 196 │ return _run_code(code, main_globals, None, │
│ 197 │ │ │ │ │ "main", mod_spec) │
│ 198 │
│ 199 def run_module(mod_name, init_globals=None, │
│ │
│ C:\Users\kira\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code │
│ │
│ 83 │ │ │ │ │ loader = loader, │
│ 84 │ │ │ │ │ package = pkg_name, │
│ 85 │ │ │ │ │ spec = mod_spec) │
│ ❱ 86 │ exec(code, run_globals) │
│ 87 │ return run_globals │
│ 88 │
│ 89 def _run_module_code(code, init_globals=None, │
│ │
│ in :7 │
│ │
│ 4 from accelerate.commands.accelerate_cli import main │
│ 5 if name == 'main': │
│ 6 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 7 │ sys.exit(main()) │
│ 8 │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in │
│ main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:914 in │
│ launch_command │
│ │
│ 911 │ elif args.use_megatron_lm and not args.cpu: │
│ 912 │ │ multi_gpu_launcher(args) │
│ 913 │ elif args.multi_gpu and not args.cpu: │
│ ❱ 914 │ │ multi_gpu_launcher(args) │
│ 915 │ elif args.tpu and not args.cpu: │
│ 916 │ │ if args.tpu_use_cluster: │
│ 917 │ │ │ tpu_pod_launcher(args) │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:603 in │
│ multi_gpu_launcher │
│ │
│ 600 │ ) │
│ 601 │ with patch_environment(*current_env): │
│ 602 │ │ try: │
│ ❱ 603 │ │ │ distrib_run.run(args) │
│ 604 │ │ except Exception: │
│ 605 │ │ │ if is_rich_available() and debug: │
│ 606 │ │ │ │ console = get_console() │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\torch\distributed\run.py:752 in run │
│ │
│ 749 │ │ ) │
│ 750 │ │
│ 751 │ config, cmd, cmd_args = config_from_args(args) │
│ ❱ 752 │ elastic_launch( │
│ 753 │ │ config=config, │
│ 754 │ │ entrypoint=cmd, │
│ 755 │ )(cmd_args) │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\torch\distributed\launcher\api.py:131 in │
│ call │
│ │
│ 128 │ │ self._entrypoint = entrypoint │
│ 129 │ │
│ 130 │ def call(self, *args): │
│ ❱ 131 │ │ return launch_agent(self._config, self._entrypoint, list(args)) │
│ 132 │
│ 133 │
│ 134 def _get_entrypoint_name( │
│ │
│ F:\Stable Diffusion\kohya_ss\venv\lib\site-packages\torch\distributed\launcher\api.py:245 in │
│ launch_agent │
│ │
│ 242 │ │ │ # if the error files for the failed children exist │
│ 243 │ │ │ # @record will copy the first error (root cause) │
│ 244 │ │ │ # to the error file of the launcher process. │
│ ❱ 245 │ │ │ raise ChildFailedError( │
│ 246 │ │ │ │ name=entrypoint_name, │
│ 247 │ │ │ │ failures=result.failures, │
│ 248 │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ChildFailedError:
logging_dir
is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Useproject_dir
instead. warnings.warn( [W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W ..\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-MU52LFJ]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ F:\Stable Diffusion\kohya_ss\train_network.py:814 intrain_network.py FAILED
Failures: