Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.69k stars 140 forks source link

[Bug]: FilenotFoundError, meta.json, calling cliptokenizer.from.pretrained, and utf-8 errors when trying to train over custom models. #202

Closed 311-code closed 7 months ago

311-code commented 7 months ago

What happened?

I have a custom local sdxl dreambooth model I wanted to train over which I made in kohya ss gui. I usually do this with kohya gui to train over them to improve some things, but wanted to try onetrainer out.

I have everything configured, but when I click start training it says:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Stable-diffusion-webui-master/models/Stable-Diffusion/memodel.safetensors\\meta.json'

ValueError: Calling CLIPTokenizer.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

UnicodeDecodeError: 'utf-8' codes can't decode byte 0xbf in position 25705: invalid start byte

During handing of the above exception, another exception occured:

Then a bunch of other errors as a result.

I do not have any configuration. jsons or meta.jsons anymore for these custom dreambooth as never had to use them with when training over them in kohya ss gui.

Any ideas how I can work around this? Just want to make sure you can train over custom models like some models on civitai that may not include configuration files, or if I can use a basic template one and where to put it.

I tried placing random config.json and model_index.json in the model's directory to see if it would do anything but it was same error.

Thanks!

What did you expect would happen?

That is would train over a custom model.

Relevant log output

Pasted above.

Output of pip freeze

Having trouble with this.

mx commented 7 months ago

This is unlikely a bug. Please post your configuration you're using when you get this error.

Nerogar commented 7 months ago

And please also add your full error log

311-code commented 7 months ago

Ok, I'll be back in front of pc later tonight, will do.

311-code commented 7 months ago

Ok here is the error and full config. I have 4090 24gb:

activating venv C:\OneTrainer2\venv
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
C:\OneTrainer2\venv\src\diffusers\src\diffusers\models\lora.py:300: FutureWarning: `LoRACompatibleConv` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleConv` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleConv", "1.0.0", deprecation_message)
C:\OneTrainer2\venv\src\diffusers\src\diffusers\models\lora.py:384: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
Traceback (most recent call last):
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 275, in load
    model = self.__load_internal(model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 56, in __load_internal
    with open(os.path.join(base_model_name, "meta.json"), "r") as meta_file:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/stable-diffusion-webui-master/models/Stable-diffusion/memodel.safetensors\\meta.json'

Traceback (most recent call last):
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 282, in load
    model = self.__load_diffusers(model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 97, in __load_diffusers
    tokenizer_1 = CLIPTokenizer.from_pretrained(
  File "C:\OneTrainer2\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1925, in from_pretrained
    raise ValueError(
ValueError: Calling CLIPTokenizer.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

Traceback (most recent call last):
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 428, in load_config
    config_dict = cls._dict_from_json_file(config_file)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 550, in _dict_from_json_file
    text = reader.read()
  File "C:\Users\NewPC\AppData\Local\Programs\Python\Python310\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 25705: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 289, in load
    model = self.__load_safetensors(model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 237, in __load_safetensors
    pipeline.vae = AutoencoderKL.from_pretrained(
  File "C:\OneTrainer2\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\models\modeling_utils.py", line 569, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "C:\OneTrainer2\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 432, in load_config
    raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at 'D:/stable-diffusion-webui-master/models/VAE/sdxl.vae.safetensors' is not a valid JSON file.

Traceback (most recent call last):
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 428, in load_config
    config_dict = cls._dict_from_json_file(config_file)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 550, in _dict_from_json_file
    text = reader.read()
  File "C:\Users\NewPC\AppData\Local\Programs\Python\Python310\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 25705: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 296, in load
    model = self.__load_ckpt(model_type, weight_dtypes, model_names.base_model, model_names.vae_model)
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 186, in __load_ckpt
    pipeline.vae = AutoencoderKL.from_pretrained(
  File "C:\OneTrainer2\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\models\modeling_utils.py", line 569, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "C:\OneTrainer2\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\OneTrainer2\venv\src\diffusers\src\diffusers\configuration_utils.py", line 432, in load_config
    raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at 'D:/stable-diffusion-webui-master/models/VAE/sdxl.vae.safetensors' is not a valid JSON file.

Traceback (most recent call last):
  File "C:\OneTrainer2\modules\ui\TrainUI.py", line 477, in __training_thread_function
    trainer.start()
  File "C:\OneTrainer2\modules\trainer\GenericTrainer.py", line 118, in start
    self.model = self.model_loader.load(
  File "C:\OneTrainer2\modules\modelLoader\StableDiffusionXLModelLoader.py", line 304, in load
    raise Exception("could not load model: " + model_names.base_model)
Exception: could not load model: C:/stable-diffusion-webui-master/models/Stable-diffusion/memodel.safetensors

Config:

{
    "__version": 2,
    "training_method": "FINE_TUNE",
    "model_type": "STABLE_DIFFUSION_XL_10_BASE",
    "debug_mode": false,
    "debug_dir": "C:/stable-diffusion-webui-master/outputs/memodel-sdxl/debug",
    "workspace_dir": "C:/stable-diffusion-webui-master/outputs/memodel-sdxl/workspace",
    "cache_dir": "C:/stable-diffusion-webui-master/outputs/memodel-sdxl/cache",
    "tensorboard": false,
    "tensorboard_expose": false,
    "continue_last_backup": true,
    "include_train_config": "NONE",
    "base_model_name": "C:/stable-diffusion-webui-master/models/Stable-diffusion/memodel.safetensors",
    "weight_dtype": "BFLOAT_16",
    "output_dtype": "BFLOAT_16",
    "output_model_format": "SAFETENSORS",
    "output_model_destination": "C:/stable-diffusion-webui-master/outputs/memodel-sdxl/output/memodel_trained.safetensors",
    "gradient_checkpointing": true,
    "concept_file_name": "training_concepts/concepts.json",
    "concepts": null,
    "circular_mask_generation": false,
    "random_rotate_and_crop": false,
    "aspect_ratio_bucketing": false,
    "latent_caching": true,
    "clear_cache_before_training": false,
    "learning_rate_scheduler": "CONSTANT",
    "learning_rate": 1e-05,
    "learning_rate_warmup_steps": 200,
    "learning_rate_cycles": 1,
    "epochs": 200,
    "batch_size": 3,
    "gradient_accumulation_steps": 1,
    "ema": "OFF",
    "ema_decay": 0.999,
    "ema_update_step_interval": 1,
    "train_device": "cuda",
    "temp_device": "cpu",
    "train_dtype": "BFLOAT_16",
    "fallback_train_dtype": "FLOAT_32",
    "only_cache": false,
    "resolution": "1024",
    "attention_mechanism": "DEFAULT",
    "align_prop": false,
    "align_prop_probability": 0.1,
    "align_prop_loss": "AESTHETIC",
    "align_prop_weight": 0.01,
    "align_prop_steps": 20,
    "align_prop_truncate_steps": 0.5,
    "align_prop_cfg_scale": 7.0,
    "mse_strength": 1.0,
    "mae_strength": 0.0,
    "vb_loss_strength": 1.0,
    "min_snr_gamma": 0.0,
    "dropout_probability": 0.0,
    "loss_scaler": "NONE",
    "learning_rate_scaler": "NONE",
    "offset_noise_weight": 0.0,
    "perturbation_noise_weight": 0.0,
    "rescale_noise_scheduler_to_zero_terminal_snr": false,
    "force_v_prediction": false,
    "force_epsilon_prediction": false,
    "min_noising_strength": 0.0,
    "max_noising_strength": 1.0,
    "noising_weight": 0.0,
    "noising_bias": 0.5,
    "unet": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 10000,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": 1e-05,
        "weight_dtype": "BFLOAT_16"
    },
    "prior": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 10000,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "text_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 200,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": 3e-06,
        "weight_dtype": "BFLOAT_16"
    },
    "text_encoder_layer_skip": 0,
    "text_encoder_2": {
        "__version": 0,
        "model_name": "",
        "train": false,
        "stop_training_after": 30,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "BFLOAT_16"
    },
    "text_encoder_2_layer_skip": 0,
    "vae": {
        "__version": 0,
        "model_name": "D:/stable-diffusion-webui-master/models/VAE/sdxl.vae.safetensors",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "FLOAT_32"
    },
    "effnet_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder_text_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder_vqgan": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "masked_training": false,
    "unmasked_probability": 0.1,
    "unmasked_weight": 0.1,
    "normalize_masked_area_loss": false,
    "embeddings": [
        {
            "__version": 0,
            "model_name": "",
            "train": true,
            "stop_training_after": null,
            "stop_training_after_unit": "NEVER",
            "token_count": 1,
            "initial_embedding_text": "*",
            "weight_dtype": "FLOAT_32"
        }
    ],
    "embedding_weight_dtype": "FLOAT_32",
    "lora_model_name": "",
    "lora_rank": 16,
    "lora_alpha": 1.0,
    "lora_weight_dtype": "FLOAT_32",
    "optimizer": {
        "__version": 0,
        "optimizer": "ADAFACTOR",
        "adam_w_mode": false,
        "alpha": null,
        "amsgrad": false,
        "beta1": null,
        "beta2": null,
        "beta3": null,
        "bias_correction": false,
        "block_wise": false,
        "capturable": false,
        "centered": false,
        "clip_threshold": 1.0,
        "d0": null,
        "d_coef": null,
        "dampening": null,
        "decay_rate": -0.8,
        "decouple": false,
        "differentiable": false,
        "eps": 1e-30,
        "eps2": 0.001,
        "foreach": false,
        "fsdp_in_use": false,
        "fused": false,
        "growth_rate": null,
        "initial_accumulator_value": null,
        "is_paged": false,
        "log_every": null,
        "lr_decay": null,
        "max_unorm": null,
        "maximize": false,
        "min_8bit_size": null,
        "momentum": null,
        "nesterov": false,
        "no_prox": false,
        "optim_bits": null,
        "percentile_clipping": null,
        "relative_step": false,
        "safeguard_warmup": false,
        "scale_parameter": false,
        "stochastic_rounding": true,
        "use_bias_correction": false,
        "use_triton": false,
        "warmup_init": false,
        "weight_decay": 0.0
    },
    "optimizer_defaults": {
        "ADAFACTOR": {
            "__version": 0,
            "optimizer": "ADAFACTOR",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": 1.0,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": -0.8,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-30,
            "eps2": 0.001,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": true,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.0
        }
    },
    "sample_definition_file_name": "training_samples/samples.json",
    "samples": null,
    "sample_after": 100,
    "sample_after_unit": "STEP",
    "sample_image_format": "JPG",
    "samples_to_tensorboard": false,
    "non_ema_sampling": false,
    "backup_after": 1,
    "backup_after_unit": "EPOCH",
    "rolling_backup": false,
    "rolling_backup_count": 30,
    "backup_before_save": true,
    "save_after": 30,
    "save_after_unit": "EPOCH",
    "save_filename_prefix": ""
}
311-code commented 7 months ago

I think this has something to do with the fp16 vae fix selected. I deleted the link to a vae and it's training now. I also tried changing it to float16 in the gui but same error. Any ideas with this added info? Would prefer to use this vae.

I use it because without it if you train over juggernaut v9 or merge jugg with my dreambooths it produces white orb artifacts and orange haze here is link to fp16 fix vae: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/tree/main

311-code commented 7 months ago

Okay, I resolved it. didn't realize the vae field seems to require diffusers format for the vae? (even though it has ability to select .ckpt and .safetensor file) Because I also tried using the original sdxl_vae.safetensors and it still gave that error.

If that's the case I would maybe suggest adding a note there about requiring diffusers and disable ability to select a .safetensor and .ckpt in that field.

Anyways, I just pasted in madebyollin/sdxl-vae-fp16-fix (for anyone reading.. which in the models tab under vae) It downloads the vae then and works.

The only issue I have now is trying to have it train for 60 epochs and save every 1 epoch. I have all the settings set but it's still only training for 1299 steps with 74 photos.

So far this has been a much better experience than kohya ss gui. So thanks for this!

Edit: Nm it's working with the saving, I'm so used to kohya ss gui.. I didn't realize it just starts over on the next epoch. Omg, "continue from last backup" option is amazing here. It's going to save hours of time and easy to use. Thanks again.