aigc-apps / sd-webui-EasyPhoto

📷 EasyPhoto | Your Smart AI Photo Generator.
Apache License 2.0
4.98k stars 399 forks source link

[Bug]: Virtual fitting error #409

Closed hjj-lmx closed 8 months ago

hjj-lmx commented 8 months ago

Is there an existing issue for this?

Is EasyPhoto the latest version?

What happened?

Virtual fitting error 1710401667(1)

Steps to reproduce the problem

1

What should have happened?

The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 2 More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in --num_processes=1. --num_machines was set to a value of 1 --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. [2024-03-14 15:23:37,083] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。). A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2024-03-14 15:23:54,211 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found. 2024-03-14 15:23:54,219 - modelscope - INFO - TensorFlow version 2.16.1 Found. 2024-03-14 15:23:54,219 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer 2024-03-14 15:23:54,376 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed 2024-03-14 15:23:54,998 - modelscope - INFO - PyTorch version 2.1.2+cu121 Found. 2024-03-14 15:23:55,004 - modelscope - INFO - TensorFlow version 2.16.1 Found. 2024-03-14 15:23:55,010 - modelscope - INFO - Loading ast index from C:\Users\lhcx.cache\modelscope\ast_indexer 2024-03-14 15:23:55,210 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 b9af109f4bc3e4b3d70e7f2b23c9b24e and a total number of 943 components indexed [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。). Traceback (most recent call last): File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in main() File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper result = func(*args, kwargs) File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main accelerator = Accelerator( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init self.state = AcceleratorState( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init PartialState(cpu, kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper func_return = func(*args, *kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group defaultpg, = _new_process_group_helper( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL built in") RuntimeError: Distributed package doesn't have NCCL built in [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-L2OV9PU]:3456 (system error: 10049 - 在其上下文中,该请求的地址无效。). Traceback (most recent call last): File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 1390, in main() File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\utils\gpu_info.py", line 195, in wrapper result = func(args, kwargs) File "E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya\train_lora.py", line 711, in main accelerator = Accelerator( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\accelerator.py", line 358, in init self.state = AcceleratorState( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 720, in init PartialState(cpu, kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\state.py", line 192, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper func_return = func(*args, **kwargs) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group defaultpg, = _new_process_group_helper( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL built in") RuntimeError: Distributed package doesn't have NCCL built in [2024-03-14 15:23:59,246] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10016) of binary: E:\stable-diffusion-webui-1.8\venv\Scripts\python.exe Traceback (most recent call last): File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lhcx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 989, in main() File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 985, in main launch_command(args) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 970, in launch_command multi_gpu_launcher(args) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher distrib_run.run(args) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\run.py", line 797, in run elastic_launch( File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "E:\stable-diffusion-webui-1.8\venv\lib\site-packages\torch\distributed\launcher\api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py FAILED

Failures: [1]: time : 2024-03-14_15:23:59 host : DESKTOP-L2OV9PU rank : 1 (local_rank: 1) exitcode : 1 (pid: 13156) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2024-03-14_15:23:59 host : DESKTOP-L2OV9PU rank : 0 (local_rank: 0) exitcode : 1 (pid: 10016) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

2024-03-14 15:24:00,374 - EasyPhoto - Error executing the command: Command '['E:\stable-diffusion-webui-1.8\venv\Scripts\python.exe', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', 'E:\stable-diffusion-webui-1.8\extensions\sd-webui-EasyPhoto\scripts\train_kohya/train_lora.py', '--pretrained_model_name_or_path=extensions\sd-webui-EasyPhoto\models\stable-diffusion-v1-5', '--pretrained_model_ckpt=models\Stable-diffusion\Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=outputs\easyphoto-cloth-id-infos\test_black_200_200\processed_images', '--caption_column=text', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--dataloader_num_workers=0', '--max_train_steps=200', '--checkpointing_steps=200', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=42', '--rank=128', '--network_alpha=64', '--output_dir=E:\stable-diffusion-webui-1.8\outputs/easyphoto-cloth-id-infos\test_black_200_200\user_weights', '--logging_dir=E:\stable-diffusion-webui-1.8\outputs/easyphoto-cloth-id-infos\test_black_200_200\user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--cach

Commit where the problem happens

webui: EastPhoto: 1

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

no

List of enabled extensions

1710401667(1)

Console logs

1

Additional information

1