LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
385 stars 55 forks source link

Don´t know what is happening #5

Open xplpex opened 10 months ago

xplpex commented 10 months ago

C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe: can't open file 'C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\ia\ComfyUI_windows_portable\ComfyUI\models\checkpoints\epicphotogasm_zUniversal.safetensors', '--train_data_dir=C:/ia/photyo/train', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=LumaLora', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=LumaLora', '--train_batch_size=1', '--save_every_n_epochs=50', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=16', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2. Train finished Prompt executed in 2.72 seconds

xplpex commented 10 months ago

something with python , but I don´t have any idea

LarryJane491 commented 10 months ago

From the second line: "No such file or directory". So it couldn't find the program.

Your folders should look like this: custom_nodes/Lora-Training-in-Comfy/[lots of files and folders]

Can you confirm that's how it looks?

xplpex commented 10 months ago

image_2024-01-16_153502138
I searched for the file and it was there same name and everthing

image_2024-01-16_153710577

LarryJane491 commented 10 months ago

Ah, I see the problem. When you download from github, it creates a folder named Lora-Training-In-Comfy-main.

But the folder must be named Lora-Training-in-Comfy. Remove the -main and it will work ^^.

That's my bad. I need to find a way to have a less strict requirement on the folder name. For now though, the custom node must be named Lora-Training-in-Comfy!

xplpex commented 10 months ago

for what I saw this is my folder path C:\ia\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts

and this is the desired one C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py

the difference seems to be that my has a /ComfyUI/ folder inside ComfyUI_windows_portable while the program tries to find one that doesn´t , any way to fix that without fucking my comfyui?

LarryJane491 commented 10 months ago

Wait, I'm confused now. These screenshots you posted, that's your setup, right? If that's so the custom node is clearly named Lora-Training-In-Comfy-main.

Or did you remove -main and still have this issue?

Don't worry about the ComfyUI folder, what matters is the path from the launcher to the custom node ^^.

xplpex commented 10 months ago

I already removed "-main-" , but now getting a new kind of error related to a series of warnings , [Bug]: ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU

LarryJane491 commented 10 months ago

A cuda error means the torch module isn't using your GPU. You have to reinstall that module. For that, follow the instructions related to Pytorch site, written in the Troubleshoot section of the main page. It's easy, you just go to the website, fill the options for your situation, and copy-paste the code it gives you into a command prompt (but you have to do it in the right Python environment).

If it doesn't work, that's likely because you have used the one-click install of Comfy. Very useful, but makes Python dependency installation a bit more complicated. I recommend installing the base version of ComfyUI instead. Follow my guide here:

https://www.reddit.com/r/comfyui/comments/1995whb/guide_learn_to_deal_with_python_programs/

You don't have to delete your current ComfyUI folder, just create a new one following this guide.

knobiknows commented 8 months ago

I have the same issue with the LoRA trainer looking for different installation paths and I effectively had to copy it 3 times.

E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

errored out so I had to add the git install described above:

E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main/sd-scripts/train_network.py

This then gave another error, looking for the install minus the 'ComfyUI' folder, so I copied the whole thing once more:

E:\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

canasnunes commented 8 months ago

I have this problem, I don't really know what to do, any suggestions?

C:\ComfyUI_windows_portable_nvidia_cu121_or_cpu\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

C:\Users\canas\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe: Error while finding module specification for 'accelerate.commands.launch' (ModuleNotFoundError: No module named 'accelerate')

Train finished

Maxxxel commented 8 months ago

i have the same problem missing accelerate but its installed in the venv..

Fischeey commented 8 months ago

i am having the same problem though i noticed both in mine and the first error posted here it is looking for the file in ComfyUI_windows_portable\custom_nodes where as in their computer it is at ComfyUI_windows_portable\ComfyUI\custom_nodes

I noticed the exact same thing on mine so that could be some sort of problem, but idk shit, here is my error: C:\Users\jacks\AppData\Local\Programs\Python\Python312\python.exe: can't open file 'C:\Users\jacks\Documents\Stable Diffusion\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory

so its looking for the custom nodes folder in comfyUI_Windows_portable when in reality it is in comfyUI_Windows_portable\ComfyUI

ok edit it seems larryjane awnsered this up above saying it only matter in relation to the launcher but it still seems weird to me idk

serget2 commented 7 months ago

same problem all I can find is a accelerate.YAML but it keeps telling me it can not find it, the directory is named as Larry says, it was already nemed correctly (without the _main), yet it still says : AppData\Local\Programs\Python\Python310\python.exe: Error while finding module specification for 'accelerate.commands.launch' (ModuleNotFoundError: No module named 'accelerate') Train finished Prompt executed in 1.10 seconds

It does say it finished the train, YET, I can not find my lora in the Models

the yaml says this when opened in editor: command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: 'NO' downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: all machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} mixed_precision: fp16 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false

kreshnov commented 6 months ago

for what I saw this is my folder path C:\ia\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts

and this is the desired one C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py

the difference seems to be that my has a /ComfyUI/ folder inside ComfyUI_windows_portable while the program tries to find one that doesn´t , any way to fix that without fucking my comfyui?

For me the problem was the program was looking for the custom_nodes folder in the wrong location. Same as what @xplpex pointed out, and thanks for helping me figuring it out, by the way!

By the way, to me it only happened when running the "Lora Training in Comfy (Advanced)" node in ComfyUI. The non-advanced version was working fine. Since the only problem was the location of custom_nodes folder, and I did not want to copy the entire directory and manage it in two different places, I just made a dynamic link and it solved my problem!

I use Windows 11 so here are the steps (should work on Windows 10 as well):

This should create a symbolic-link directory named custom_nodes inside ComfyUI_windows_portable leading to the one inside ComfyUI. This shouldn't interfere with anything, hope other people find this information helpful.

Cheers!

alen918573 commented 5 months ago

loading model for process 0/1 load StableDiffusion checkpoint: E:\Ai\ComfyUI_windows_portable\ComfyUI\models\checkpoints\DreamShaper_8_pruned.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net Traceback (most recent call last): File "E:\Ai\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "E:\Ai\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 236, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 259, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 252, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention_processor.py", line 261, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\NK\AppData\Local\Programs\Python\Python310\python.exe', 'E:/Ai/ComfyUI_windows_portable/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=E:\Ai\ComfyUI_windows_portable\ComfyUI\models\checkpoints\DreamShaper_8_pruned.safetensors', '--train_data_dir=E:/database', '--output_dir=E:\files', '--logging_dir=./logs', '--log_prefix=lindalaura', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=lindalaura', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=22', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 14.07 seconds

okay so i guess its a memory restriction or i am completly wrong ?? i have a 3070 and 16Gb ram not sure would it work in 16Gb ram....

any Help is appricated thanks

Leshiy1 commented 4 months ago

Yup have similar issue tried to launch the lora training ad have this изображение_2024-07-21_230720305

Leshiy1 commented 4 months ago

I install accelerate module through command line but then it start gives me this error изображение_2024-07-21_231406052

Leshiy1 commented 4 months ago

Fixed most this errors with "-pip install" command like "-pip install accelerate" .etc just adding name of module that this error show up in this case it was "toml" module.

Leshiy1 commented 4 months ago

Finally it passthrough to loading image size step, but then i start getting this изображение_2024-07-21_234026661 изображение_2024-07-21_234052293