Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.75k stars 148 forks source link

[Bug]: Error while train a pretrained SD3.5 medium safetensors #533

Closed joneschunghk closed 10 hours ago

joneschunghk commented 3 days ago

What happened?

I've fine-tuned the basic SD3.5 medium loaded from Huggingface and I'm trying to fine-tune pre-trained safetensors on a local path. But I am getting error while training pretrained SD3.5 medium safetensors. I have tried all settings on "Weight Data Type" but same result. Screenshot 2024-11-01 181648 Screenshot 2024-11-01 181242

What did you expect would happen?

I have successfully fine-tuned SD3 medium using this method

Relevant log output

Clearing cache directory D:/AI/Stable_Diffusion/Datasets/FineTunes/SD35/EASD35_V2/workspace-cache! You can disable this if you want to continue using the same cache.
No backup found, continuing without backup...
TensorFlow installation not found - running with reduced feature set.
Fetching 21 files: 100%|████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 21016.56it/s]
Loading pipeline components...:  22%|███████████▌                                        | 2/9 [00:00<00:03,  2.14it/s]Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.17.0 at http://localhost:6006/ (Press CTRL+C to quit)
Loading pipeline components...:  78%|████████████████████████████████████████▍           | 7/9 [01:24<00:24, 12.00s/it]
Traceback (most recent call last):
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 238, in load
    self.__load_internal(
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 37, in __load_internal
    raise Exception("not an internal model")
Exception: not an internal model

Traceback (most recent call last):
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 248, in load
    self.__load_diffusers(
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 51, in __load_diffusers
    tokenizer_1 = CLIPTokenizer.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 2053, in from_pretrained
    raise ValueError(
ValueError: Calling CLIPTokenizer.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

Traceback (most recent call last):
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 258, in load
    self.__load_safetensors(
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 168, in __load_safetensors
    pipeline = StableDiffusion3Pipeline.from_single_file(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 495, in from_single_file
    loaded_sub_model = load_single_file_sub_model(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 102, in load_single_file_sub_model
    loaded_sub_model = load_method(
                       ^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file_model.py", line 299, in from_single_file
    unexpected_keys = load_model_dict_into_meta(model, diffusers_format_checkpoint, dtype=torch_dtype)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\src\diffusers\src\diffusers\models\model_loading_utils.py", line 223, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because pos_embed.pos_embed expected shape tensor([[[ 0.0000,  0.0000,  0.0000,  ...,  1.0000,  1.0000,  1.0000],
         [ 0.3272,  0.3197,  0.3124,  ...,  1.0000,  1.0000,  1.0000],
         [ 0.6184,  0.6059,  0.5935,  ...,  1.0000,  1.0000,  1.0000],
         ...,
         [ 0.1674, -0.9699, -0.3513,  ...,  1.0000,  1.0000,  1.0000],
         [ 0.4807, -0.8412, -0.6262,  ...,  1.0000,  1.0000,  1.0000],
         [ 0.7412, -0.6242, -0.8384,  ...,  1.0000,  1.0000,  1.0000]]]), but got torch.Size([1, 147456, 1536]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

Traceback (most recent call last):
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\ui\TrainUI.py", line 557, in __training_thread_function
    trainer.start()
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\trainer\GenericTrainer.py", line 124, in start
    self.model = self.model_loader.load(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\StableDiffusion3FineTuneModelLoader.py", line 46, in load
    base_model_loader.load(model, model_type, model_names, weight_dtypes)
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 279, in load
    raise Exception("could not load model: " + model_names.base_model)
Exception: could not load model: D:/AI/Stable_Diffusion/Datasets/FineTunes/SD35/models/EASD35_V1.safetensors

Output of pip freeze

(venv) D:\AI\Stable_Diffusion\OneTrainer>pip freeze absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.3 aiohttp==3.10.10 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 attrs==24.2.0 bitsandbytes==0.44.0 certifi==2024.8.30 charset-normalizer==3.4.0 cloudpickle==3.1.0 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@e45c25d03aeb0a967d8aaa0f6a79f280f6838e1f#egg=diffusers filelock==3.16.1 flatbuffers==24.3.25 fonttools==4.54.1 frozenlist==1.5.0 fsspec==2024.10.0 ftfy==6.3.1 grpcio==1.67.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.10 importlib_metadata==8.5.0 intel-openmp==2021.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.8 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==3.0.2 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@fa78a18f05978a2054d7cbe3ea2902a655078709#egg=mgds mkl==2021.4.0 mpmath==1.3.0 multidict==6.1.0 networkx==3.4.2 numpy==1.26.4 omegaconf==2.3.0 onnxruntime-gpu==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.6 pooch==1.8.1 prodigyopt==1.0 propcache==0.2.0 protobuf==4.25.5 psutil==6.1.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.2.0 pyreadline3==3.5.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.9.11 requests==2.32.3 rich==13.9.3 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.3 tbb==2021.13.1 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.11 tokenizers==0.19.1 torch==2.3.1+cu118 torchmetrics==1.5.1 torchvision==0.18.1+cu118 tqdm==4.66.4 transformers==4.42.3 typing_extensions==4.12.2 urllib3==2.2.3 wcwidth==0.2.13 Werkzeug==3.1.0 xformers==0.0.27+cu118 yarl==1.17.1 zipp==3.20.2

(venv) D:\AI\Stable_Diffusion\OneTrainer>

elen07zz commented 1 day ago

same

O-J1 commented 10 hours ago

@elen07zz @joneschunghk OneTrainer expects the diffusers version of SD3.5 and not the safetensors version. Closing as this is not a bug.