LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
277 stars 38 forks source link

No such file or directory error, but the path is correct...? #49

Open BusyGettingBusy opened 1 month ago

BusyGettingBusy commented 1 month ago

Hello, I have seen similar errors in the other issues posted, but they were all different. None of these solutions solved my problem.

I ran the advanced LORA trainer and I keep getting the following error. I see the error "[Errno 2] No such file or directory". However, I checked and this is the path to that file. I also checked that the path to the images folder is correct (that folder contains the folder that contains the images).

got prompt
[rgthree] Using rgthree's optimized recursive execution.
C:\Users\balco\AI_local_stuff\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
C:\Users\balco\AppData\Local\Programs\Python\Python310\python.exe: can't open file 'C:\\Users\\balco\\AI_local_stuff\\ComfyUI_windows_portable\\custom_nodes\\Lora-Training-in-Comfy\\sd-scripts\\train_network.py': [Errno 2] No such file or directory
Traceback (most recent call last):
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 989, in <module>
    main()
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 985, in main
    launch_command(args)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 979, in launch_command
    simple_launcher(args)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\balco\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\\Users\\balco\\AI_local_stuff\\ComfyUI_windows_portable\\ComfyUI\\models\\checkpoints\\v1-5-pruned.safetensors', '--train_data_dir=C:/Users/balco/AI_local_stuff/SD LORA training/Maimy/images', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=maimy-v1.0', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=maimy-v1.0', '--train_batch_size=1', '--save_every_n_epochs=7', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=11', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2.
Train finished
Prompt executed in 2.28 seconds

By the way, I also ran in the non-advanced version, and it gave me a similar error, but with traceback saying this. However, I am focused on fixing the advanced one.

Traceback (most recent call last):
  File "C:\Users\balco\AI_local_stuff\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 11, in <module>
    import toml
ModuleNotFoundError: No module named 'toml'
Scorpio-mfg commented 1 month ago

same error

ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory
Traceback (most recent call last):
xtrash commented 3 weeks ago

Hi. This is because the "ComfyUI" folder is not on the path to the missing file. You should have "ComfyUI" between "\ComfyUI_windows_portable" and "\custom_nodes\". So -> \ComfyUI_windows_portable\ComfyUI\custom_nodes... The easiest way is to start ComfyUI from the "ComfyUI" forlder, not from the "ComfyUI_windows_portable" folder.

You can edit the "run_nvidia_gpu.bat" like this :

REM Switch python session to virtual environment if needed : call ComfyUI\venv\Scripts\activate.bat echo "Venv 03 Activated"

cd ComfyUI

..\python_embeded\python.exe -s main.py --windows-standalone-build --port 8030 --cuda-device 0 pause


Thats how I solved this problem, but you can edit the "train.py" file (in Lora-Training-in-Comfy) to change the differents "progpath" variable affectation. I think there is something to do with the "progpath = os.getcwd()" function .

Looking for the training script.

    progpath = os.getcwd()
BusyGettingBusy commented 3 weeks ago

Thank you @xtrash for your response. However, I updated "run_nvidia_gpu.bat" to precisely what you pasted and I am getting the same error still.

C:\Users\balco\AI_local_stuff\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "C:\Users\balco\AI_local_stuff\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 11, in <module>
    import toml
ModuleNotFoundError: No module named 'toml'
Traceback (most recent call last):
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 989, in <module>
    main()
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 985, in main
    launch_command(args)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 979, in launch_command
    simple_launcher(args)
  File "C:\Users\balco\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\balco\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', 'C:/Users/balco/AI_local_stuff/ComfyUI_windows_portable/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\\Users\\balco\\AI_local_stuff\\ComfyUI_windows_portable\\ComfyUI\\models\\checkpoints\\v1-5-pruned.safetensors', '--train_data_dir=C:/Users/balco/AI_local_stuff/SD LORA training/prepare training data output/img', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=maimytj-v0.1-comfyui', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=maimytj-v0.1-comfyui', '--train_batch_size=1', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=12', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=1', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1.
Train finished
Prompt executed in 2.06 seconds
BusyGettingBusy commented 3 weeks ago

Closed by accident.

xtrash commented 3 weeks ago

Hi BusyGettingBusy. I was answering Scorpio-mfg about the path issue.

By the way, I remember struggling with the " No module named 'toml' " error too. I don't know exactly how I solved this, but here is what I did :

.\python_embeded\python.exe -s main.py --windows-standalone-build --port 8030 --cuda-device 0

then changed "==" for ">=" in the requirements.txt text file I get, and then updated everything with :

pip install -r requirements.txt --upgrade

To exactly know the last version, I use this trick trying to update torch (or torchvision and torchaudio) with torch=+cu121 that will return an error with the last version number :

pip install torch==+cu121 --index-url https://download.pytorch.org/whl/cu121 The answer will look like : ERROR: Could not find a version that satisfies the requirement torch==+cu121 (from versions: 2.1.0+cu121, 2.1.1+cu121, 2.1.2+cu121, 2.2.0+cu121, 2.2.1+cu121, 2.2.2+cu121, 2.3.0+cu121, 2.3.1+cu121)

The last version number is at the end of the error : 2.3.1+cu121

now you can reinstall torch with the correct version number : pip install torch==2.3.1+cu121 --index-url https://download.pytorch.org/whl/cu121

After that, I did the same with torchvision and torchaudio.

I don't remember why I decided to update everything, but it worked for me. I must admit that I was surprised that Lora-Training-in-Comfy finally works with everything up to date and not sticking to the requirements file of this custom node.