d8ahazard / sd_dreambooth_extension

Other
1.85k stars 283 forks source link

[Bug]: Permission access dinied #1432

Closed owendswang closed 5 months ago

owendswang commented 6 months ago

Is there an existing issue for this?

What happened?

error while building lora.

Steps to reproduce the problem

build lora

Commit and libraries

8207ccd85430fdfddcb4afa8589c88305be40f9c

Package Version


absl-py 2.0.0 accelerate 0.21.0 addict 2.4.0 aenum 3.1.15 aiofiles 23.2.1 aiohttp 3.9.1 aiosignal 1.3.1 altair 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 3.7.1 appdirs 1.4.4 attrs 23.1.0 basicsr 1.4.2 beautifulsoup4 4.12.2 bitsandbytes 0.41.2.post2 blendmodes 2022 cachetools 5.3.2 certifi 2023.11.17 charset-normalizer 3.3.2 clean-fid 0.1.35 click 8.1.7 clip-anytorch 2.5.2 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.2.0 cycler 0.12.1 dadaptation 3.2 dctorch 0.1.2 deprecation 2.1.0 diffusers 0.25.0 discord-webhook 1.3.0 docker-pycreds 0.4.0 einops 0.4.1 facexlib 0.3.0 falcon 3.1.3 fastapi 0.94.0 ffmpy 0.3.1 filelock 3.13.1 filterpy 1.4.5 flatbuffers 23.5.26 flowdas 0.5.0 fonttools 4.47.0 frozenlist 1.4.1 fsspec 2023.12.2 ftfy 6.1.3 future 0.18.3 gdown 4.7.1 gfpgan 1.3.8 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.25.2 google-auth-oauthlib 1.0.0 gradio 3.41.2 gradio_client 0.5.0 greenlet 3.0.3 grpcio 1.60.0 h11 0.12.0 httpcore 0.15.0 httpx 0.24.1 huggingface-hub 0.20.1 humanfriendly 10.0 idna 3.6 imageio 2.33.1 importlib-metadata 7.0.1 importlib-resources 6.1.1 inflection 0.5.1 invisible-watermark 0.2.0 Jinja2 3.1.2 jsonmerge 1.8.0 jsonschema 4.20.0 jsonschema-specifications 2023.12.1 k-diffusion 0.1.1.post1 kiwisolver 1.4.5 kornia 0.6.7 lark 1.1.2 lazy_loader 0.3 lightning-utilities 0.10.0 llvmlite 0.41.1 lmdb 1.4.1 lpips 0.1.4 Markdown 3.5.1 MarkupSafe 2.1.3 matplotlib 3.8.2 mpmath 1.3.0 multidict 6.0.4 networkx 3.2.1 numba 0.58.1 numpy 1.26.2 nvidia-cublas-cu11 11.11.3.6 nvidia-cublas-cu12 12.3.4.1 nvidia-cuda-nvrtc-cu11 11.8.89 nvidia-cuda-nvrtc-cu12 12.3.103 nvidia-cuda-runtime-cu11 11.8.89 nvidia-cuda-runtime-cu12 12.3.101 nvidia-cudnn-cu12 8.9.7.29 oauthlib 3.2.2 omegaconf 2.2.3 onnx 1.15.0 onnx-graphsurgeon 0.3.27 onnxruntime-directml 1.16.3 open-clip-torch 2.20.0 opencv-python 4.8.1.78 orjson 3.9.10 packaging 23.2 pandas 2.1.4 piexif 1.1.3 Pillow 9.5.0 pip 23.3.2 platformdirs 4.1.0 polygraphy 0.49.0 protobuf 3.20.2 psutil 5.9.5 pyasn1 0.5.1 pyasn1-modules 0.3.0 pydantic 1.10.13 pydantic_core 2.14.6 pydub 0.25.1 pyparsing 3.1.1 pyreadline3 3.4.1 PySocks 1.7.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytorch-lightning 1.9.4 pytorch_optimizer 2.12.0 pytz 2023.3.post1 PyWavelets 1.5.0 PyYAML 6.0.1 realesrgan 0.3.0 referencing 0.32.0 regex 2023.12.25 requests 2.31.0 requests-oauthlib 1.3.1 resize-right 0.0.2 rpds-py 0.16.2 rsa 4.9 safetensors 0.3.1 scikit-image 0.21.0 scipy 1.11.4 semantic-version 2.10.0 Send2Trash 1.8.2 sentencepiece 0.1.99 sentry-sdk 1.39.1 setproctitle 1.3.3 setuptools 65.5.0 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 soupsieve 2.5 SQLAlchemy 2.0.24 starlette 0.26.1 sympy 1.12 tb-nightly 2.16.0a20231228 tensorboard 2.13.0 tensorboard-data-server 0.7.2 tensorrt 9.0.1.post11.dev4 tensorrt-bindings 9.0.1.post11.dev4 tensorrt-libs 9.0.1.post11.dev4 tf_keras-nightly 2.16.0.dev2023122810 tifffile 2023.12.9 timm 0.9.2 tokenizers 0.13.3 tomesd 0.1.3 tomli 2.0.1 toolz 0.12.0 torch 2.1.2+cu121 torchdiffeq 0.2.3 torchmetrics 1.2.1 torchsde 0.2.6 torchvision 0.16.2 tqdm 4.66.1 trampoline 0.1.2 transformers 4.30.2 typeable 0.6.0 typing_extensions 4.9.0 tzdata 2023.3 urllib3 2.1.0 uvicorn 0.25.0 wandb 0.16.1 wcwidth 0.2.12 websockets 11.0.3 Werkzeug 3.0.1 wheel 0.42.0 xformers 0.0.23.post1 yapf 0.40.2 yarl 1.9.4 zipp 3.17.0

Command Line Arguments

--no-download-sd-model --skip-python-version-check --skip-version-check --skip-prepare-environment --skip-install --xformers --lowvram

Console logs

Traceback (most recent call last):██████████████████████████████████▋                     | 2/3 [00:14<00:06,  6.03s/it]
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1917, in main
    return inner_loop()
           ^^^^^^^^^^^^
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1874, in inner_loop
    check_save(True)
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1027, in check_save
    save_weights(
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1433, in save_weights
    convert_diffusers_to_kohya_lora(lora_save_file, meta, args.lora_weight)
  File "D:\UserData\XXXXXX\Downloads\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\diff_lora_to_sd_lora.py", line 120, in convert_diffusers_to_kohya_lora
    os.remove(path)
PermissionError: [WinError 5] Access is denied: 'D:\\UserData\\XXXXXX\\Downloads\\stable-diffusion-webui\\models\\Lora\\test_786.safetensors'
Steps:  25%|█████████▎                           | 750/3000 [16:40<50:00,  1.33s/it, loss=0.00477, lr=0.0001, vram=3.3]
Duration: 00:16:45
Generating Samples: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.23s/it]
[2024-01-06 17:30:50,184][DEBUG][dreambooth.utils.model_utils] - Restored system models.
Duration: 00:16:46


### Additional information

_No response_
levicki commented 6 months ago

The problem is with this code in diff_lora_to_sd_lora.py:

def convert_diffusers_to_kohya_lora(path, metadata, alpha=0.8):
    model_dict = safetensors.torch.load_file(path)

safetensors.torch.load_file does not seem to close the file so the remove call fails, and because dreambooth doesn't check the outcome then rename call also fails since the file still exists and the training is terminated.

The worst part of this issue is that the file remains locked even after you terminate AUTOMATIC1111 and all associated processes. The file handle isn't being held by any user or system process so I suspect that it is being held by kernel code, probably NVIDIA CUDA driver — it can only be deleted after reboot which is a major nuisance.

Workaround until someone files a bug upstream and they fix it:

def convert_diffusers_to_kohya_lora(path, metadata, alpha=0.8):
    with open(path, "rb") as fp:
        model_data = fp.read()
    model_dict = safetensors.torch.load(model_data)
owendswang commented 6 months ago

If this is a bug, how is everyone training LORA- or models?

levicki commented 6 months ago

If this is a bug, how is everyone training LORA- or models?

You can always try to delete your venv and let the requirements be installed from scratch, maybe that fixes the problem and maybe it doesn't but you won't know until you try. Alternatively you can try using one of the other solutions for training I listed above.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days