Closed elen07zz closed 8 months ago
Hummm... this is odd... trying to replicate on my side
Hummm... this is odd... trying to replicate on my side
this is my preset. test.json
Can you try to run the config found in "./test/config/Standard-AdamW.json" ?
Does it work? I just tried:
git clone https://github.com/bmaltais/kohya_ss.git
git checkout dev
.\setup.bat
.\gui.bat
Then loaded that LoRA config and hit Start.
Training completed just fine.
Can you try that on your side?
I am not sure why it is trying to use FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'
... this error does not come from the gui but from the sd-script trainer... this is odd...
Can you try to run the config found in "./test/config/Standard-AdamW.json" ?
Does it work? I just tried:
git clone https://github.com/bmaltais/kohya_ss.git git checkout dev .\setup.bat .\gui.bat
Then loaded that LoRA config and hit Start.
Training completed just fine.
Can you try that on your side?
I am not sure why it is trying to use
FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'
... this error does not come from the gui but from the sd-script trainer... this is odd...
I have tried that preset. The only thing that i have changed is Pretrained model name or path: D:/stable-diffusion-webui/models/Stable-diffusion/Training/v1-5-pruned-emaonly.safetensors
Still doesn't work. Same error.
Traceback (most recent call last):
File "D:\kohya_ss\sd-scripts\train_network.py", line 1058, in
Traceback (most recent call last):
File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
This is my config
This is really strange. Have you tried deleting the venv folder and run setup.bat again?
@bmaltais it seems that UI sets default value for "Resume from saved training state (path to "last-state" state folder)".
This is really strange. Have you tried deleting the venv folder and run setup.bat again?
Yes, I have deleted the folder completely.
I have cleaned everything and downloaded everything from scratch.
This sound like it can’t find that file somehow. I wonder if it might be a PATH issue where modules are looking for files… what is hard is to troubleshoot when I can’t reproduce the issue locally.
you may need to revert to the c22.6.2 release until this is fixed.
@bmaltais this is an issue with "Resume from training state". It's trying to resume from kohya_ss folder. Somewhy this field is pre-filled with current folder.
@elen07zz just clear this field, it's under "Advanced" panel.
This issue was fixed here as I see. 2 hours ago. So git pull
should help
Contributor
Yeah i think that was the error.
Same with vae
@elen07zz can you try to delete C:\Users\YOURNAME\ .cache and see if this will make things work?
@bmaltais this is an issue with "Resume from training state". It's trying to resume from kohya_ss folder. Somewhy this field is pre-filled with current folder.
@elen07zz just clear this field, it's under "Advanced" panel.
This issue was fixed here as I see. 2 hours ago. So
git pull
should help
its working after doing this.
@elen07zz So the issue was the field that contained the wrong value. Yes, the original release was filling in values for empty fields, causing execution errors. The latest commit fix this. There are probably other minor issues linked to the major code refactoring.
With the latest update im getting this error, I have tried with a fresh install and its the same.
Traceback (most recent call last): File "D:\kohya_ss\sd-scripts\train_network.py", line 1058, in
trainer.train(args)
File "D:\kohya_ss\sd-scripts\train_network.py", line 460, in train
train_util.resume_from_local_or_hf_if_specified(accelerator, args)
File "D:\kohya_ss\sd-scripts\library\train_util.py", line 3511, in resume_from_local_or_hf_if_specified
accelerator.load_state(args.resume)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2861, in load_state
load_accelerator_state(
File "D:\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py", line 204, in load_accelerator_state
state_dict = torch.load(input_model_file, map_location=map_location)
File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "D:\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'D:\kohya_ss\pytorch_model.bin'
Traceback (most recent call last): File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\xfarw\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'D:\kohya_ss/sd-scripts/train_network.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_accumulation_steps=4', '--learning_rate=0.0002', '--logging_dir=D:/Training/Datasets/TrainingData\log', '--lr_scheduler=cosine_with_restarts', '--lr_scheduler_num_cycles=15', '--max_data_loader_n_workers=1', '--max_grad_norm=1', '--resolution=768,768', '--max_token_length=225', '--max_train_epochs=15', '--max_train_steps=704', '--min_snr_gamma=10', '--mixed_precision=bf16', '--network_alpha=64', '--network_dim=128', '--network_module=networks.lora', '--multires_noise_iterations=8', '--multires_noise_discount=0.2', '--optimizer_type=AdamW', '--output_dir=D:/Training/Datasets/TrainingData\model', '--output_name=m4ryeliz', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/Training/v1-5-pruned-emaonly.safetensors', '--resume=D:/kohya_ss', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--scale_weight_norms=1', '--seed=1075857709', '--text_encoder_lr=0.0001', '--train_batch_size=8', '--train_data_dir=D:/Training/Datasets/TrainingData\img', '--unet_lr=0.0001', '--vae=D:/stable-diffusion-webui/models/VAE/anythingKlF8Anime2VaeFtMse840000_vaeFtMse840000Pt.pt', '--xformers', '--sample_sampler=dpmsolver++', '--sample_prompts=D:/Training/Datasets/TrainingData\model\sample\prompt.txt', '--sample_every_n_epochs=1']' returned non-zero exit status 1.