[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 2766.96it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 235
mean ar error (without repeats): 0.0
preparing accelerator
[W socket.cpp:697] [c10d] The client socket has failed to connect to [lmlicenses.wip4.adobe.com]:29500 (system error: 10049
Die angeforderte Adresse ist in diesem Kontext ung³ltig.).
Traceback (most recent call last):
File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in
trainer.train(args)
File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 220, in train
accelerator = train_util.prepare_accelerator(args)
File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3826, in prepare_accelerator
accelerator = Accelerator(
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\accelerator.py", line 369, in init
self.state = AcceleratorState(
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\state.py", line 732, in init
PartialState(cpu, kwargs)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\state.py", line 202, in init
torch.distributed.init_process_group(backend=self.backend, kwargs)
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
defaultpg, = _new_process_group_helper(
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
[2024-04-27 19:03:32,582] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 4472) of binary: F:\ComfyUi_Python.venv\Scripts\python.exe
Traceback (most recent call last):
File "C:\Users\gzone\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\gzone\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 996, in
main()
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 977, in launch_command
multi_gpu_launcher(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I've tried many thing but nothing worked so far.
Before, I've used the on click install methode for ComfyUI and installed all nodes.
Then I've tried to use git clone instead.
I am on Windows 10 and I am using NVIDIA
and the console shows the following:
Torch version: 2.2.2+cu121 Is CUDA enabled? True
Hey does anyone else run into this error?
[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 2766.96it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 235 mean ar error (without repeats): 0.0 preparing accelerator [W socket.cpp:697] [c10d] The client socket has failed to connect to [lmlicenses.wip4.adobe.com]:29500 (system error: 10049
Die angeforderte Adresse ist in diesem Kontext ung³ltig.). Traceback (most recent call last): File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in
trainer.train(args)
File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 220, in train
main()
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 977, in launch_command
multi_gpu_launcher(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
accelerator = train_util.prepare_accelerator(args) File "F:\ComfyUi_Python\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3826, in prepare_accelerator accelerator = Accelerator( File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\accelerator.py", line 369, in init self.state = AcceleratorState( File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\state.py", line 732, in init PartialState(cpu, kwargs) File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\state.py", line 202, in init torch.distributed.init_process_group(backend=self.backend, kwargs) File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper func_return = func(*args, **kwargs) File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
defaultpg, = _new_process_group_helper( File "F:\ComfyUi_Python.venv\lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL built in") RuntimeError: Distributed package doesn't have NCCL built in [2024-04-27 19:03:32,582] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 4472) of binary: F:\ComfyUi_Python.venv\Scripts\python.exe Traceback (most recent call last): File "C:\Users\gzone\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\gzone\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\ComfyUi_Python.venv\lib\site-packages\accelerate\commands\launch.py", line 996, in
F:/ComfyUi_Python/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py FAILED
Failures: [1]: time : 2024-04-27_19:03:32 host : *** rank : 1 (local_rank: 1) exitcode : 1 (pid: 10424) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2024-04-27_19:03:32 host : **** rank : 0 (local_rank: 0) exitcode : 1 (pid: 4472) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Train finished Prompt executed in 16.07 seconds
I've tried many thing but nothing worked so far. Before, I've used the on click install methode for ComfyUI and installed all nodes. Then I've tried to use git clone instead.
I am on Windows 10 and I am using NVIDIA and the console shows the following: Torch version: 2.2.2+cu121 Is CUDA enabled? True
Any ideas what the issue might be?