[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui
Is EasyPhoto the latest version?
[X] I have updated EasyPhoto to the latest version and the bug still exists.
What happened?
Virtual fitting error
Steps to reproduce the problem
1
What should have happened?
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 2
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in --num_processes=1.
--num_machines was set to a value of 1--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
[2024-03-14 15:23:37,083] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。).
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
2024-03-14 15:23:54,211 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found.
2024-03-14 15:23:54,219 - modelscope - INFO - TensorFlow version 2.16.1 Found.
2024-03-14 15:23:54,219 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer
2024-03-14 15:23:54,376 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed
2024-03-14 15:23:54,998 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found.
2024-03-14 15:23:55,004 - modelscope - INFO - TensorFlow version 2.16.1 Found.
2024-03-14 15:23:55,010 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer
2024-03-14 15:23:55,210 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed
[W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in
main()
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper
result = func(*args, kwargs)
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main
accelerator = Accelerator(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init
self.state = AcceleratorState(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init
PartialState(cpu, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init
torch.distributed.init_process_group(backend=self.backend, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, *kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
defaultpg, = _new_process_group_helper(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
[W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in
main()
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper
result = func(args, kwargs)
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main
accelerator = Accelerator(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init
self.state = AcceleratorState(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init
PartialState(cpu, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init
torch.distributed.init_process_group(backend=self.backend, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
defaultpg, = _new_process_group_helper(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
[2024-03-14 15:23:59,246] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10016) of binary: E:\stable-diffusion-webui-1.8\venv\Scripts\python.exe
Traceback (most recent call last):
File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in
main()
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main
launch_command(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 970, in launch_command multi_gpu_launcher(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\run.py", line 797, in run
elastic_launch(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Is there an existing issue for this?
Is EasyPhoto the latest version?
What happened?
Virtual fitting error
Steps to reproduce the problem
1
What should have happened?
The following values were not passed to
main()
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper
result = func(*args, kwargs)
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main
accelerator = Accelerator(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init
self.state = AcceleratorState(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init
PartialState(cpu, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init
torch.distributed.init_process_group(backend=self.backend, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, *kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
defaultpg, = _new_process_group_helper(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
[W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in
main()
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper
result = func( args, kwargs)
File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main
accelerator = Accelerator(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init
self.state = AcceleratorState(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init
PartialState(cpu, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init
torch.distributed.init_process_group(backend=self.backend, kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
defaultpg, = _new_process_group_helper(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
[2024-03-14 15:23:59,246] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10016) of binary: E:\stable-diffusion-webui-1.8\venv\Scripts\python.exe
Traceback (most recent call last):
File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in
main()
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main
launch_command(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 970, in launch_command multi_gpu_launcher(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\run.py", line 797, in run
elastic_launch(
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
accelerate launch
and had defaults used instead:--num_processes
was set to a value of2
More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in--num_processes=1
.--num_machines
was set to a value of1
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. [2024-03-14 15:23:37,083] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。). A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2024-03-14 15:23:54,211 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found. 2024-03-14 15:23:54,219 - modelscope - INFO - TensorFlow version 2.16.1 Found. 2024-03-14 15:23:54,219 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer 2024-03-14 15:23:54,376 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed 2024-03-14 15:23:54,998 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found. 2024-03-14 15:23:55,004 - modelscope - INFO - TensorFlow version 2.16.1 Found. 2024-03-14 15:23:55,010 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer 2024-03-14 15:23:55,210 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。). Traceback (most recent call last): File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, inE:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED
Failures: [1]: time : 2024-03-14_15:23:59 host : DESKTOP-L2OV9PU rank : 1 (local_rank: 1) exitcode : 1 (pid: 13156) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2024-03-14_15:23:59 host : DESKTOP-L2OV9PU rank : 0 (local_rank: 0) exitcode : 1 (pid: 10016) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
2024-03-14 15:24:00,374 - EasyPhoto - Error executing the command: Command '['E:\stable-diffusion-webui-1.8\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-cloth-id-infos\test_black_200_200\processed_images', '--caption_column=text', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=200', '--checkpointing_steps=200', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--output_dir=E:\stable-diffusion-webui-1.8\outputs/easyphoto-cloth-id-infos\test_black_200_200\user_weights', '--logging_dir=E:\stable-diffusion-webui-1.8\outputs/easyphoto-cloth-id-infos\test_black_200_200\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--cach
Commit where the problem happens
webui: EastPhoto: 1
What browsers do you use to access the UI ?
Microsoft Edge
Command Line Arguments
List of enabled extensions
Console logs
Additional information
1