[Bug]: Permission access dinied #1432

Closed owendswang closed 5 months ago

owendswang commented 6 months ago

Is there an existing issue for this?

What happened?

error while building lora.

Steps to reproduce the problem

build lora

Commit and libraries


Package Version

Command Line Arguments

--no-download-sd-model --skip-python-version-check --skip-version-check --skip-prepare-environment --skip-install --xformers --lowvram

Console logs

Traceback (most recent call last):██████████████████████████████████▋                     | 2/3 [00:14<00:06,  6.03s/it]
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1917, in main
    return inner_loop()
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1874, in inner_loop
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1027, in check_save
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1433, in save_weights
    convert_diffusers_to_kohya_lora(lora_save_file, meta, args.lora_weight)
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\diff_lora_to_sd_lora.py", line 120, in convert_diffusers_to_kohya_lora
PermissionError: [WinError 5] Access is denied: 'D:\\UserData\\XXXXXX\\Downloads\\stable-diffusion-webui\\models\\Lora\\test_786.safetensors'
Steps:  25%|█████████▎                           | 750/3000 [16:40<50:00,  1.33s/it, loss=0.00477, lr=0.0001, vram=3.3]
Duration: 00:16:45
Generating Samples: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.23s/it]
[2024-01-06 17:30:50,184][DEBUG][dreambooth.utils.model_utils] - Restored system models.
Duration: 00:16:46

Additional information

levicki commented 6 months ago

The problem is with this code in diff_lora_to_sd_lora.py:

def convert_diffusers_to_kohya_lora(path, metadata, alpha=0.8):
    model_dict = safetensors.torch.load_file(path)

safetensors.torch.load_file does not seem to close the file so the remove call fails, and because dreambooth doesn't check the outcome then rename call also fails since the file still exists and the training is terminated.

The worst part of this issue is that the file remains locked even after you terminate AUTOMATIC1111 and all associated processes. The file handle isn't being held by any user or system process so I suspect that it is being held by kernel code, probably NVIDIA CUDA driver — it can only be deleted after reboot which is a major nuisance.

Workaround until someone files a bug upstream and they fix it:

def convert_diffusers_to_kohya_lora(path, metadata, alpha=0.8):
    with open(path, "rb") as fp:
        model_data = fp.read()
    model_dict = safetensors.torch.load(model_data)
owendswang commented 6 months ago

If this is a bug, how is everyone training LORA- or models?

levicki commented 6 months ago

If this is a bug, how is everyone training LORA- or models?

You can always try to delete your venv and let the requirements be installed from scratch, maybe that fixes the problem and maybe it doesn't but you won't know until you try. Alternatively you can try using one of the other solutions for training I listed above.

