Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.81k stars 153 forks source link

[Bug]: Can't load Stable Diffusion 3.5 medium diffuser model #528

Closed 954114865 closed 3 weeks ago

954114865 commented 3 weeks ago

What happened?

Can't load Stable Diffusion 3.5 medium diffuser model, says It looks like the config file at 'G:/stable-diffusion-3-medium-diffusers/transformer/diffusion_pytorch_model.safetensors' is not a valid JSON file. I get the diffuser model from https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers/tree/main/transformer so I'm sure the JSON file is right, after get this error, I even tried the JSON file from https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main and it still not working.

What did you expect would happen?

normally training

Relevant log output

TensorFlow installation not found - running with reduced feature set.
Fetching 11 files: 100%|████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 10977.24it/s]
Loading pipeline components...:  25%|████████████▊                                      | 1/4 [00:00<00:00, 499.50it/s]
Traceback (most recent call last):
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 238, in load
    self.__load_internal(
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 37, in __load_internal
    raise Exception("not an internal model")
Exception: not an internal model

Traceback (most recent call last):
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 432, in load_config
    config_dict = cls._dict_from_json_file(config_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 557, in _dict_from_json_file
    text = reader.read()
           ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 79200: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 248, in load
    self.__load_diffusers(
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 74, in __load_diffusers
    noise_scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\schedulers\scheduling_utils.py", line 150, in from_pretrained
    config, kwargs, commit_hash = cls.load_config(
                                  ^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 436, in load_config
    raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at 'G:/stable-diffusion-3-medium-diffusers/transformer/diffusion_pytorch_model.safetensors' is not a valid JSON file.

Traceback (most recent call last):
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 495, in from_single_file
    loaded_sub_model = load_single_file_sub_model(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 168, in load_single_file_sub_model
    raise SingleFileComponentError(
diffusers.loaders.single_file_utils.SingleFileComponentError: Failed to load CLIPTextModel. Weights for this component appear to be missing in the checkpoint.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 258, in load
    self.__load_safetensors(
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 168, in __load_safetensors
    pipeline = StableDiffusion3Pipeline.from_single_file(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 510, in from_single_file
    raise SingleFileComponentError(
diffusers.loaders.single_file_utils.SingleFileComponentError: Failed to load CLIPTextModel. Weights for this component appear to be missing in the checkpoint.
Please load the component before passing it in as an argument to `from_single_file`.

text_encoder = CLIPTextModel.from_pretrained('...')
pipe = StableDiffusion3Pipeline.from_single_file(<checkpoint path>, text_encoder=text_encoder)

Traceback (most recent call last):
  File "G:\OneTrainer\modules\ui\TrainUI.py", line 557, in __training_thread_function
    trainer.start()
  File "G:\OneTrainer\modules\trainer\GenericTrainer.py", line 124, in start
    self.model = self.model_loader.load(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\OneTrainer\modules\modelLoader\StableDiffusion3FineTuneModelLoader.py", line 46, in load
    base_model_loader.load(model, model_type, model_names, weight_dtypes)
  File "G:\OneTrainer\modules\modelLoader\stableDiffusion3\StableDiffusion3ModelLoader.py", line 279, in load
    raise Exception("could not load model: " + model_names.base_model)
Exception: could not load model: G:/stable-diffusion-3-medium-diffusers/transformer/diffusion_pytorch_model.safetensors

Output of pip freeze

(venv) G:\OneTrainer\venv\Scripts>pip freeze absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.3 aiohttp==3.10.10 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 attrs==24.2.0 bitsandbytes==0.44.0 certifi==2024.8.30 charset-normalizer==3.4.0 cloudpickle==3.1.0 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@e45c25d03aeb0a967d8aaa0f6a79f280f6838e1f#egg=diffusers filelock==3.16.1 flatbuffers==24.3.25 fonttools==4.54.1 frozenlist==1.5.0 fsspec==2024.10.0 ftfy==6.3.1 grpcio==1.67.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.10 importlib_metadata==8.5.0 intel-openmp==2021.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.8 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==3.0.2 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@fa78a18f05978a2054d7cbe3ea2902a655078709#egg=mgds mkl==2021.4.0 mpmath==1.3.0 multidict==6.1.0 networkx==3.4.2 numpy==1.26.4 omegaconf==2.3.0 onnxruntime-gpu==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.6 pooch==1.8.1 prodigyopt==1.0 propcache==0.2.0 protobuf==4.25.5 psutil==6.1.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.2.0 pyreadline3==3.5.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.9.11 requests==2.32.3 rich==13.9.3 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 sentencepiece==0.2.0 setuptools==75.3.0 six==1.16.0 sympy==1.13.3 tbb==2021.13.1 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.11 tokenizers==0.19.1 torch==2.3.1+cu118 torchmetrics==1.5.1 torchvision==0.18.1+cu118 tqdm==4.66.4 transformers==4.42.3 typing_extensions==4.12.2 urllib3==2.2.3 wcwidth==0.2.13 Werkzeug==3.0.6 wheel==0.44.0 xformers==0.0.27+cu118 yarl==1.17.0 zipp==3.20.2

Calamdor commented 3 weeks ago

Safetensor models are not supported for new sd3 and flux and certain other models. Please use the diffusors model and clone it. https://huggingface.co/stabilityai/stable-diffusion-3.5-medium

Calamdor commented 3 weeks ago

Mainly because the standard safetensors is only the transformer and VAE, and does not include the text encoders.

O-J1 commented 3 weeks ago

This is not an OT issue, as Calamdor has said, please use the diff version.

954114865 commented 3 weeks ago

Mainly because the standard safetensors is only the transformer and VAE, and does not include the text encoders.

Thanks. I just found that I mistakenly downloaded the sd3 but not 3.5, I am so sorry for waste your time. But I still checked the repository of sd3.5, it's all .safetensors format, don't it? both text encoder and transformers are .safetensors. 图片 图片

Calamdor commented 3 weeks ago

This is correct. There is currently no way to use a single safetensor for these models though. OneTrainer currently only works with the same structure as Huggingface, and it will load the required individual safetensors and load them in the weight you select.

O-J1 commented 3 weeks ago

Mainly because the standard safetensors is only the transformer and VAE, and does not include the text encoders.

[snip]

In the time since I closed this issue Nerogar has deployed a fix in commit 0f459e4. Please run update.bat again and safetensors will now work but I must reiterate this was not a bug. SD3.5 is not officially supported currently its only alpha, please do not use it expecting it work flawlessly.