Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.79k stars 151 forks source link

[Bug]: Training doesnt start due to not able to load both local and Hugging Face hosted model #434

Closed FurkanGozukara closed 1 month ago

FurkanGozukara commented 3 months ago

What happened?

I tested both local and Hugging Face repo. It is running in a virtual machine Windows 11

It is connected to the internet via VPN - mandatory

Firewall and antivirus turned off and tried that way too

Tested models are SDXL

RealVisXL_V4.0.safetensors downloaded to c drive full path given as

C:/stable-diffusion-webui/models/Stable-diffusion/RealVisXL_V4.0.safetensors

Also Hugging Face repo name tested : SG161222/RealVisXL_V4.0

Both fails

What did you expect would happen?

Just start training

Relevant log output

activating venv C:\one_trainer\OneTrainer\venv
Using Python "C:\one_trainer\OneTrainer\venv\Scripts\python.exe"
C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\loaders\single_file.py:352: FutureWarning: `original_config_file` is deprecated and will be removed in version 1.0.0. `original_config_file` argument is deprecated and will be removed in future versions.please use the `original_config` argument instead.
  deprecate("original_config_file", "1.0.0", deprecation_message)
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...:   0%|                                                            | 0/7 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel:
 ['text_model.embeddings.position_ids']
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:02<00:00,  2.45it/s]
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...:   0%|                                                            | 0/7 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel:
 ['text_model.embeddings.position_ids']
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:01<00:00,  3.55it/s]
Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 225, in load
    self.__load_internal(model, model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 51, in __load_internal
    raise Exception("not an internal model")
Exception: not an internal model

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 231, in load
    self.__load_diffusers(model, model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 61, in __load_diffusers
    tokenizer_1 = CLIPTokenizer.from_pretrained(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2053, in from_pretrained
    raise ValueError(
ValueError: Calling CLIPTokenizer.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 383, in load_config
    config_file = hf_hub_download(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '     '.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 237, in load
    self.__load_safetensors(model, model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 191, in __load_safetensors
    pipeline.vae = AutoencoderKL.from_pretrained(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\models\modeling_utils.py", line 616, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 420, in load_config
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like       is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 383, in load_config
    config_file = hf_hub_download(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '     '.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 243, in load
    self.__load_ckpt(model, model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 141, in __load_ckpt
    pipeline.vae = AutoencoderKL.from_pretrained(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\models\modeling_utils.py", line 616, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "C:\one_trainer\OneTrainer\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\one_trainer\OneTrainer\venv\src\diffusers\src\diffusers\configuration_utils.py", line 420, in load_config
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like       is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.

Traceback (most recent call last):
  File "C:\one_trainer\OneTrainer\modules\ui\TrainUI.py", line 542, in __training_thread_function
    trainer.start()
  File "C:\one_trainer\OneTrainer\modules\trainer\GenericTrainer.py", line 120, in start
    self.model = self.model_loader.load(
  File "C:\one_trainer\OneTrainer\modules\modelLoader\StableDiffusionXLFineTuneModelLoader.py", line 46, in load
    base_model_loader.load(model, model_type, model_names, weight_dtypes)
  File "C:\one_trainer\OneTrainer\modules\modelLoader\stableDiffusionXL\StableDiffusionXLModelLoader.py", line 250, in load
    raise Exception("could not load model: " + model_names.base_model)
Exception: could not load model: C:/stable-diffusion-webui/models/Stable-diffusion/RealVisXL_V4.0.safetensors

Output of pip freeze

Microsoft Windows [Version 10.0.22631.4037] (c) Microsoft Corporation. All rights reserved.

C:\one_trainer\OneTrainer\venv\Scripts>activate

(venv) C:\one_trainer\OneTrainer\venv\Scripts>pip freeze absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.3.5 aiohttp==3.10.3 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==24.2.0 bitsandbytes==0.43.1 certifi==2024.7.4 charset-normalizer==3.3.2 cloudpickle==3.0.0 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.2.1 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@dd4b731e68f88f58dfabfb68f28e00ede2bb90ae#egg=diffusers filelock==3.15.4 flatbuffers==24.3.25 fonttools==4.53.1 frozenlist==1.4.1 fsspec==2024.6.1 ftfy==6.2.3 grpcio==1.65.4 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==8.2.0 intel-openmp==2021.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.5 lightning-utilities==0.11.6 lion-pytorch==0.1.4 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@d38efdf377a2d52c32aebf7820f10342e16221bf#egg=mgds mkl==2021.4.0 mpmath==1.3.0 multidict==6.0.5 networkx==3.3 numpy==1.26.4 omegaconf==2.3.0 onnxruntime-gpu==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.2.2 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.4 psutil==6.0.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.2 pyreadline3==3.4.1 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.7.24 requests==2.32.3 rich==13.7.1 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.2 tbb==2021.13.1 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.8 tokenizers==0.19.1 torch==2.3.1+cu118 torchmetrics==1.4.1 torchvision==0.18.1+cu118 tqdm==4.66.4 transformers==4.42.3 typing_extensions==4.12.2 urllib3==2.2.2 wcwidth==0.2.13 Werkzeug==3.0.3 xformers==0.0.27+cu118 yarl==1.9.4 zipp==3.20.0

(venv) C:\one_trainer\OneTrainer\venv\Scripts>

FurkanGozukara commented 3 months ago

@Nerogar when as a vae this used it is fixed : stabilityai/sdxl-vae

image

cvang187 commented 1 month ago

For anyone who have tried all listed solutions on open ticket, try setting your access token settings on HuggingFace.

image

O-J1 commented 1 month ago

Not an OT issue as far as I can see.