bmaltais / kohya_ss

Apache License 2.0
9.64k stars 1.24k forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\kohya_ss\\pytorch_model.bin' #2046

Closed elen07zz closed 8 months ago

elen07zz commented 8 months ago

With the latest update im getting this error, I have tried with a fresh install and its the same.

Traceback (most recent call last): File "D:\kohya_ss\sd-scripts\train_network.py", line 1058, in trainer.train(args) File "D:\kohya_ss\sd-scripts\train_network.py", line 460, in train train_util.resume_from_local_or_hf_if_specified(accelerator, args) File "D:\kohya_ss\sd-scripts\library\train_util.py", line 3511, in resume_from_local_or_hf_if_specified accelerator.load_state(args.resume) File "D:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2861, in load_state load_accelerator_state( File "D:\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py", line 204, in load_accelerator_state state_dict = torch.load(input_model_file, map_location=map_location) File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 986, in load with _open_file_like(f, 'rb') as opened_file: File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 435, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 416, in init super().init(open(name, mode))

FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'

Traceback (most recent call last): File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'D:\kohya_ss/sd-scripts/train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_accumulation_steps=4', '--learning_rate=0.0002', '--logging_dir=D:/Training/Datasets/TrainingData\log', '--lr_scheduler=cosine_with_restarts', '--lr_scheduler_num_cycles=15', '--max_data_loader_n_workers=1', '--max_grad_norm=1', '--resolution=768,768', '--max_token_length=225', '--max_train_epochs=15', '--max_train_steps=704', '--min_snr_gamma=10', '--mixed_precision=bf16', '--network_alpha=64', '--network_dim=128', '--network_module=networks.lora', '--multires_noise_iterations=8', '--multires_noise_discount=0.2', '--optimizer_type=AdamW', '--output_dir=D:/Training/Datasets/TrainingData\model', '--output_name=m4ryeliz', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/Training/v1-5-pruned-emaonly.safetensors', '--resume=D:/kohya_ss', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--scale_weight_norms=1', '--seed=1075857709', '--text_encoder_lr=0.0001', '--train_batch_size=8', '--train_data_dir=D:/Training/Datasets/TrainingData\img', '--unet_lr=0.0001', '--vae=D:/stable-diffusion-webui/models/VAE/anythingKlF8Anime2VaeFtMse840000_vaeFtMse840000Pt.pt', '--xformers', '--sample_sampler=dpmsolver++', '--sample_prompts=D:/Training/Datasets/TrainingData\model\sample\prompt.txt', '--sample_every_n_epochs=1']' returned non-zero exit status 1.

bmaltais commented 8 months ago

Hummm... this is odd... trying to replicate on my side

elen07zz commented 8 months ago

Hummm... this is odd... trying to replicate on my side

this is my preset. test.json

bmaltais commented 8 months ago

Can you try to run the config found in "./test/config/Standard-AdamW.json" ?

Does it work? I just tried:

git clone https://github.com/bmaltais/kohya_ss.git
git checkout dev
.\setup.bat
.\gui.bat

Then loaded that LoRA config and hit Start.

Training completed just fine.

Can you try that on your side?

I am not sure why it is trying to use FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'... this error does not come from the gui but from the sd-script trainer... this is odd...

elen07zz commented 8 months ago

Can you try to run the config found in "./test/config/Standard-AdamW.json" ?

Does it work? I just tried:

git clone https://github.com/bmaltais/kohya_ss.git
git checkout dev
.\setup.bat
.\gui.bat

Then loaded that LoRA config and hit Start.

Training completed just fine.

Can you try that on your side?

I am not sure why it is trying to use FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'... this error does not come from the gui but from the sd-script trainer... this is odd...

I have tried that preset. The only thing that i have changed is Pretrained model name or path: D:/stable-diffusion-webui/models/Stable-diffusion/Training/v1-5-pruned-emaonly.safetensors

Still doesn't work. Same error.

Traceback (most recent call last): File "D:\kohya_ss\sd-scripts\train_network.py", line 1058, in trainer.train(args) File "D:\kohya_ss\sd-scripts\train_network.py", line 460, in train train_util.resume_from_local_or_hf_if_specified(accelerator, args) File "D:\kohya_ss\sd-scripts\library\train_util.py", line 3511, in resume_from_local_or_hf_if_specified accelerator.load_state(args.resume) File "D:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2861, in load_state load_accelerator_state( File "D:\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py", line 204, in load_accelerator_state state_dict = torch.load(input_model_file, map_location=map_location) File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 986, in load with _open_file_like(f, 'rb') as opened_file: File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 435, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 416, in init super().init(open(name, mode))

FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'

Traceback (most recent call last): File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'D:\kohya_ss/sd-scripts/train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_accumulation_steps=4', '--learning_rate=0.0002', '--logging_dir=D:/Training/Datasets/TrainingData\log', '--lr_scheduler=cosine_with_restarts', '--lr_scheduler_num_cycles=15', '--max_data_loader_n_workers=1', '--max_grad_norm=1', '--resolution=768,768', '--max_token_length=225', '--max_train_epochs=15', '--max_train_steps=704', '--min_snr_gamma=10', '--mixed_precision=bf16', '--network_alpha=64', '--network_dim=128', '--network_module=networks.lora', '--multires_noise_iterations=8', '--multires_noise_discount=0.2', '--optimizer_type=AdamW', '--output_dir=D:/Training/Datasets/TrainingData\model', '--output_name=m4ryeliz', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/Training/v1-5-pruned-emaonly.safetensors', '--resume=D:/kohya_ss', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--scale_weight_norms=1', '--seed=1075857709', '--text_encoder_lr=0.0001', '--train_batch_size=8', '--train_data_dir=D:/Training/Datasets/TrainingData\img', '--unet_lr=0.0001', '--vae=D:/stable-diffusion-webui/models/VAE/anythingKlF8Anime2VaeFtMse840000_vaeFtMse840000Pt.pt', '--xformers', '--sample_sampler=dpmsolver++', '--sample_prompts=D:/Training/Datasets/TrainingData\model\sample\prompt.txt', '--sample_every_n_epochs=1']' returned non-zero exit status 1.

elen07zz commented 8 months ago

This is my config Screenshot 2024-03-09 151538

bmaltais commented 8 months ago

This is really strange. Have you tried deleting the venv folder and run setup.bat again?

storuky commented 8 months ago

@bmaltais it seems that UI sets default value for "Resume from saved training state (path to "last-state" state folder)".

Screenshot 2024-03-10 at 00 33 47
elen07zz commented 8 months ago

This is really strange. Have you tried deleting the venv folder and run setup.bat again?

Yes, I have deleted the folder completely.

I have cleaned everything and downloaded everything from scratch.

bmaltais commented 8 months ago

This sound like it can’t find that file somehow. I wonder if it might be a PATH issue where modules are looking for files… what is hard is to troubleshoot when I can’t reproduce the issue locally.

you may need to revert to the c22.6.2 release until this is fixed.

storuky commented 8 months ago

@bmaltais this is an issue with "Resume from training state". It's trying to resume from kohya_ss folder. Somewhy this field is pre-filled with current folder.

@elen07zz just clear this field, it's under "Advanced" panel.

This issue was fixed here as I see. 2 hours ago. So git pull should help

elen07zz commented 8 months ago

Contributor

Yeah i think that was the error.

Same with vae

bmaltais commented 8 months ago

@elen07zz can you try to delete C:\Users\YOURNAME\ .cache and see if this will make things work?

elen07zz commented 8 months ago

@bmaltais this is an issue with "Resume from training state". It's trying to resume from kohya_ss folder. Somewhy this field is pre-filled with current folder.

@elen07zz just clear this field, it's under "Advanced" panel.

This issue was fixed here as I see. 2 hours ago. So git pull should help

its working after doing this.

bmaltais commented 8 months ago

@elen07zz So the issue was the field that contained the wrong value. Yes, the original release was filling in values for empty fields, causing execution errors. The latest commit fix this. There are probably other minor issues linked to the major code refactoring.