bmaltais / kohya_ss

Apache License 2.0
9.71k stars 1.25k forks source link

Error: RuntimeError: use_libuv was requested but PyTorch was build without libuv support #2934

Open toxsickcity opened 4 weeks ago

toxsickcity commented 4 weeks ago

Hello,

Can I kindly get assistance with the following error when trying to generate an LORA for Flux, I am using a config file which is suppose to allow generation with 12GB VRAM cards,

The issue when google appears to be related to multi GPU, I have tired to add additional command lines etc but cannot seem to get this to start a generation.

I tried looking here for answers but didn't help https://github.com/kohya-ss/sd-scripts/pull/1686

Any advise would be most welcomed.

System spec, i5 12600K, 64GB DDR4, 3x 12GB 3060s (Used primarily for Blender renders) but handles SD fine. Windows 11 Pro

Many thanks, Shaun.

=============================================================
Modules installed outside the virtual environment were found.
This can cause issues. Please review the installed modules.

You can uninstall all local modules with:

deactivate
pip freeze > uninstall.txt
pip uninstall -y -r uninstall.txt
=============================================================

19:59:52-630129 INFO     Kohya_ss GUI version: v24.2.0

19:59:53-048537 INFO     Submodule initialized and updated.
19:59:53-050538 INFO     nVidia toolkit detected
19:59:54-453908 INFO     Torch 2.5.0+cu124
19:59:54-486118 INFO     Torch backend: nVidia CUDA 12.4 cuDNN 90100
19:59:54-489117 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
19:59:54-489617 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
19:59:54-490618 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
19:59:54-491619 INFO     Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit
                         (AMD64)]
19:59:54-493232 INFO     Installing/Validating requirements from requirements_pytorch_windows.txt...
19:59:54-950213 INFO     Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124
19:59:54-951213 INFO     Obtaining file:///U:/kohya_ss-sd3-flux1/kohya_ss/sd-scripts (from -r
                         U:\kohya_ss-sd3-flux1\kohya_ss\requirements.txt (line 37))
19:59:54-952713 INFO     Preparing metadata (setup.py): started
19:59:55-421370 INFO     Preparing metadata (setup.py): finished with status 'done'
19:59:56-203139 INFO     Installing collected packages: library
19:59:56-204139 INFO     Attempting uninstall: library
19:59:56-205139 INFO     Found existing installation: library 0.0.0
19:59:56-207305 INFO     Uninstalling library-0.0.0:
19:59:57-272211 INFO     Successfully uninstalled library-0.0.0
19:59:57-272712 INFO     Running setup.py develop for library
19:59:58-132057 INFO     Successfully installed library
20:00:05-917912 INFO     Kohya_ss GUI version: v24.2.0

20:00:06-335685 INFO     Submodule initialized and updated.
20:00:06-338192 INFO     nVidia toolkit detected
20:00:07-627312 INFO     Torch 2.5.0+cu124
20:00:07-671837 INFO     Torch backend: nVidia CUDA 12.4 cuDNN 90100
20:00:07-674854 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
20:00:07-675851 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
20:00:07-676351 INFO     Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288MB Arch 8.6 Cores 28
20:00:07-677351 INFO     Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit
                         (AMD64)]
20:00:07-678351 INFO     Installing/Validating requirements from requirements_pytorch_windows.txt...
20:00:08-101451 INFO     Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124
20:00:08-102453 INFO     Obtaining file:///U:/kohya_ss-sd3-flux1/kohya_ss/sd-scripts (from -r
                         U:\kohya_ss-sd3-flux1\kohya_ss\requirements.txt (line 37))
20:00:08-103452 INFO     Preparing metadata (setup.py): started
20:00:08-543065 INFO     Preparing metadata (setup.py): finished with status 'done'
20:00:09-275188 INFO     Installing collected packages: library
20:00:09-276690 INFO     Attempting uninstall: library
20:00:09-277688 INFO     Found existing installation: library 0.0.0
20:00:09-280201 INFO     Uninstalling library-0.0.0:
20:00:10-337369 INFO     Successfully uninstalled library-0.0.0
20:00:10-338370 INFO     Running setup.py develop for library
20:00:11-185580 INFO     Successfully installed library
20:00:11-505648 INFO     headless: False
20:00:11-509147 INFO     Using shell=True when running external commands...
INFO: Could not find files for the given pattern(s).
* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
20:00:41-272084 INFO     Loading config...
20:01:52-951486 INFO     Loading config...
20:04:00-843130 INFO     Start training LoRA Flux1 ...
20:04:00-844131 INFO     Validating lr scheduler arguments...
20:04:00-845131 INFO     Validating optimizer arguments...
20:04:00-846130 INFO     Validating lora type is Flux1 if flux1 checkbox is checked...
20:04:00-847631 INFO     Validating U:/tmp existence and writability... SUCCESS
20:04:00-848631 INFO     Validating U:\kohya_ss-sd3-flux1\kohya_ss\models\flux-dev-fp8.safetensors existence... SUCCESS
20:04:00-849130 INFO     Validating U:\kohya_ss-sd3-flux1\kohya_ss\Input\img\8_TedModel3Flux existence... SUCCESS
20:04:00-853771 INFO     Regularization factor: 1
20:04:00-854771 INFO     Train batch size: 3
20:04:00-855271 INFO     Gradient accumulation steps: 1
20:04:00-856272 INFO     Epoch: 1
20:04:00-856771 INFO     Max train steps: 100
20:04:00-857270 INFO     stop_text_encoder_training = 0
20:04:00-857770 INFO     lr_warmup_steps = 0
20:04:00-862609 INFO     Saving training config to U:/tmp\TedModel3_20241028-200400.json...
20:04:00-864110 INFO     Executing command:
                         C:\Users\User\AppData\Local\Programs\Python\Python310\Scripts\accelerate.EXE launch
                         --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1
                         --num_machines 1 --num_cpu_threads_per_process 2
                         U:/kohya_ss-sd3-flux1/kohya_ss/sd-scripts/flux_train_network.py --config_file
                         U:/tmp/config_lora-20241028-200400.toml
W1028 20:04:04.279000 6256 site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 1097, in launch_command
    multi_gpu_launcher(args)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\run.py", line 910, in run
    elastic_launch(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\launcher\api.py", line 260, in launch_agent
    result = agent.run()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
    result = f(*args, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 696, in run
    result = self._invoke_run(role)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 849, in _invoke_run
    self._initialize_workers(self._worker_group)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
    result = f(*args, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 668, in _initialize_workers
    self._rendezvous(worker_group)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
    result = f(*args, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 500, in _rendezvous
    rdzv_info = spec.rdzv_handler.next_rendezvous()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 67, in next_rendezvous
    self._store = TCPStore(  # type: ignore[call-arg]
RuntimeError: use_libuv was requested but PyTorch was build without libuv support
20:04:07-102723 INFO     Training has ended.