Closed peteer01 closed 3 months ago
Can you post the full error?
RuntimeError: Parent directory F: does not exist.
steps: 0%|▏ | 31/9300 [14:32<72:25:56, 28.13s/it, avr_loss=0.091]
Failed to train because of error:
Command '['C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\Scripts\python.exe', 'sd_scripts\sdxl_train_network.py', '--config_file=runtime_store\config.toml', '--dataset_config=runtime_store\dataset.toml']' returned non-zero exit status 1.
saving checkpoint: F:/LoRA settings\epoch-000001.safetensors
saving state at epoch 1
Traceback (most recent call last):
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\sdxl_train_network.py", line 189, in
trainer.train(args)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\train_network.py", line 883, in train
train_util.save_and_remove_state_on_epoch_end(args, accelerator, epoch + 1)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\library\train_util.py", line 4322, in save_and_remove_state_on_epoch_end
accelerator.save_state(state_dir)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\accelerate\accelerator.py", line 2795, in save_state
save_location = save_accelerator_state(
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\accelerate\checkpointing.py", line 76, in save_accelerator_state
save(state, output_model_file)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\accelerate\utils\other.py", line 127, in save
torch.save(obj, f)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\serialization.py", line 628, in save
with _open_zipfile_writer(f) as opened_zipfile:
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\serialization.py", line 502, in _open_zipfile_writer
return container(name_or_buffer)
File "C:\Users\petee\LoRA_Easy_Training_Scripts\sd_scripts\venv\lib\site-packages\torch\serialization.py", line 473, in init
super().init(torch._C.PyTorchFileWriter(self.name))
I think I know what the issue, to confirm, could you save the epochs and training config in a subdirectory and see if it errors out? Like in F:/LoRA settings/subdirectory
. If the issue still persist try to remove the spaces from the folder name
Making a subdirectory and saving there prevented the error from occurring.
steps: 50%|██████████████████████████████ | 5/10 [03:20<03:20, 40.03s/it, avr_loss=0.134]
saving checkpoint: F:/LoRA settings/subfolder\epoch-000001.safetensors
saving state at epoch 1
epoch 2/2
Nice, I guess the issue can be closed now?
Is there an easy fix that can be made to prevent this error? Not sure how difficult it would be to fix. If using a subfolder is necessary to avoid that issue, it might be good to add that to the documentation until it's fixed.
I can assume that this will never be fixed, tbh. The only way to get it fixed is to open up an issue with kohya's sd-scripts
I am able to successfully run LoRA Easy Training Scripts without issue. When loading the same Toml file and only changing the "Save State" and "Save Last State" options to on and "Epochs","1" and trying to save to the same F:\LoRA settings as the Toml and LoRA safetensors files, I get the following error at the completion of the first Epoch:
It appears that this error occurs either because the directory is in the root of the F: or the app doesn't like the F: for saving save states. Workaround is to save to a folder in the C:. (Currently saving to a subfolder inside the Easy_LoRA_Training_Scripts folder)
Having the training data in a folder in the root of the F: does not create an issue, nor does saving the LoRA epochs, it's only when trying to save the save state in the F: that LoRA Trainer crashes.
I hope that is helpful. Let me know what additional information might be helpful.