[Bug]: Backups (diffusers) getting much different results than same .safetensors

311-code commented 4 months ago

First off love Onetrainer, but just wanted to share this issue I noticed.

This is a buddy with his face slightly changed with a midjourney lora. The prompt was supposed to have an "Audi behind him in Chicago":

Onetrainer diffusers backup:

Onetrainer's built-in (custom conversion script?). Facial distortions, further reduce likeness. No audi, I told him he's taking a taxi now. (Edit: To be clear, I don't know if the script I saw in Onetrainer is in fact used when stopping training, so it's just an issue with way Onetrainer saves .safetenors) onetrainer_converted_safetensors

To resolve: Comfyui converted to .safetensors, very similar to backup file now. Used diffusersloader to a checkpointsave node to convert to .safetensors confyui_converted_safetensors

What happened?

When I check the same checkpoint in comfyui it seems to be different when I load in the backup diffusers checkpoint at 20 epochs then check the same .safetensors at 20 epochs. Using fixed seed.

I just want to make sure it's converting properly so reporting this. I seem to get better results using diffusers backups at same epoch than the saved .safetensor.

What did you expect would happen?

Different results and likeness.

Relevant log output

No response

Output of `pip freeze`

No response

mx commented 4 months ago

Please upload your comfyui workflow.

311-code commented 4 months ago

Sure, here is the comfyui workflow: (edit: this is old and outdated now) Likeness-Vae-Encode-Comfyui-final.json

Added notes, this one works best with single dataset image put into the vae encode node, which encodes image latent into ksampler. Further enhancing finetuning likeness. (it's img2img)

311-code commented 4 months ago

I've stuck to just using the diffusers, surprised nobody tried to test this yet. I am not sure why they look different. The diffusers looks better.

I had concerns the safety checker is on during conversion causing the differences. I guess I'll dig into this to check.

mx commented 4 months ago

Please do confirm whether it's just the safety checker causing the differences. I haven't had the time to test this, and I am unlikely to have the time to test it anytime soon.

311-code commented 4 months ago

I have doubts it's safety checker now, I looked into it and think that's just the old code that would black out images a long time ago.

Safety checker seems disabled besides in convert_if.py which seems related to old deepflloyd if model.

The only scripts I could find in Onetrainer that mentioned conversion in Onetrainer was the convert_model.pyand the ConverModelUI.py but I don't see where the conversion from diffusers is happening.

Nerogar commented 4 months ago

The conversion scripts in OneTrainer are custom built. You can find them in modules/util/convert. The Stable Diffusion convert script is called convert_sd_diffusers_to_ckpt.py. I doubt that there are any bugs that just reduce quality. Any incorrectly converted keys would probably create a completely broken model, not one that's just a bit worse.

311-code commented 4 months ago

Ahh got it, Updated post with image examples btw.

mx commented 4 months ago

How are you doing the model conversion exactly? Are you running the convert_model.py script? What arguments are you using, especially the datatypes?

311-code commented 4 months ago

I am just clicking "stop training" and using the .safetensors file, or the using automatically made one (after a completed epoch) Nothing too special.

I had noticed it didn't look as good when I use it in comfyui or SD forge.

When I convert the backup model in comfyui though, and move it to SD forge it looks great.

mx commented 4 months ago

In your original post you say:

Onetrainer's built-in custom conversion script. Facial distortions, further reduce likeness. No audi, I told him he's taking a taxi now.

What, exactly, is the command line or arguments you are giving to the custom conversion script?

Nerogar commented 4 months ago

You mentioned before that using the diffusers CLIP improves quality. Is this not the case anymore? Did you find out something else? Any additional information that could help narrow this down would be useful. Is there a difference in the converted UNET? or maybe one of the text encoders? (assuming this is SDXL, not SD)

311-code commented 4 months ago

I think I'm not explaining well. I was saying that you had mentioned Onetrainer has a "built in custom script" you made.

I was saying I'm using Onetrainer normally, and at the end of training.. I imagined that this script converts or saves the model as a .safetensors? And that it does not look as good as the diffusers. If this isn't how it works though let me know. I think that's why I am confusing you. So it may just be whatever the .safetensor saving mechanism is in Onetrainer causing the issue.

I edited my old comments because I learned it wasn't just the clip, it was the entire diffusers model that looks better.

Yes there is a difference in both the clip and unet the last time I checked, but I can check again.

My current running theory is some of the blocks are not saving it exactly right for .safetensors file during save, comfyui converts the backup and it looks almost the same as the diffusers backup, and better than the Onetrainer ckpt, as you can see from the images.

Calamdor commented 4 months ago

This is not just a case where the backups are in fp32 format or whatever the model was loaded in, while the safetensors output is going to fp16 or bf16 or whatever was selected, right?

mx commented 4 months ago

This is not just a case where the backups are in fp32 format or whatever the model was loaded in, while the safetensors output is going to fp16 or bf16 or whatever was selected, right?

Yeah, this is what I'm suspecting is happening, and why I asked what the argument was for the conversion script. OP unfortunately has not uploaded the settings they used to train with in OneTrainer, so cannot check that.

311-code commented 4 months ago

Sorry, I actually did upload it before and was up for 2 weeks, but somehow it got edited, probably from editing here on my phone.

I'm getting great results in general with these setting. In regards to float32, I dunno because I can convert the diffusers to bf16 and it's fine and it looks nearly the same, but maybe that's it..

I've also included my training settings. I have 24gb vram. In this config was using batch size 2 no EMA. I am using 98000 regularization images on second concept 0.013 repeats, and 40 repeats with 128 images on main concept. It's usually done at around 2 or 3 epochs with this config.

Main preset:


{
    "__version": 2,
    "training_method": "FINE_TUNE",
    "model_type": "STABLE_DIFFUSION_XL_10_BASE",
    "debug_mode": false,
    "debug_dir": "debug",
    "workspace_dir": "C:/stable-diffusion-webui-master/outputs/ctoon-SDXL-Onetrn/Workspace",
    "cache_dir": "C:/stable-diffusion-webui-master/outputs/ctoon-SDXL-Onetrn/cache",
    "tensorboard": false,
    "tensorboard_expose": false,
    "continue_last_backup": true,
    "include_train_config": "NONE",
    "base_model_name": "D:/stable-diffusion-webui-master/models/Stable-diffusion/everclearPONYByZovya_v2VAE.safetensors",
    "weight_dtype": "BFLOAT_16",
    "output_dtype": "BFLOAT_16",
    "output_model_format": "SAFETENSORS",
    "output_model_destination": "C:/stable-diffusion-webui-master/outputs/ctoon-SDXL-Onetrn/save/ctoon-sdxl-base-onetrn",
    "gradient_checkpointing": true,
    "concept_file_name": "training_concepts/concepts.json",
    "concepts": null,
    "circular_mask_generation": false,
    "random_rotate_and_crop": false,
    "aspect_ratio_bucketing": true,
    "latent_caching": true,
    "clear_cache_before_training": false,
    "learning_rate_scheduler": "CONSTANT",
    "learning_rate": 1e-05,
    "learning_rate_warmup_steps": 200,
    "learning_rate_cycles": 1,
    "epochs": 200,
    "batch_size": 2,
    "gradient_accumulation_steps": 1,
    "ema": "OFF",
    "ema_decay": 0.0,
    "ema_update_step_interval": 0,
    "train_device": "cuda",
    "temp_device": "cpu",
    "train_dtype": "BFLOAT_16",
    "fallback_train_dtype": "FLOAT_32",
    "enable_autocast_cache": true,
    "only_cache": false,
    "resolution": "1024",
    "attention_mechanism": "DEFAULT",
    "align_prop": false,
    "align_prop_probability": 0.1,
    "align_prop_loss": "AESTHETIC",
    "align_prop_weight": 0.01,
    "align_prop_steps": 20,
    "align_prop_truncate_steps": 0.5,
    "align_prop_cfg_scale": 7.0,
    "mse_strength": 1.0,
    "mae_strength": 0.0,
    "vb_loss_strength": 1.0,
    "min_snr_gamma": 0.0,
    "dropout_probability": 0.0,
    "loss_scaler": "NONE",
    "learning_rate_scaler": "NONE",
    "offset_noise_weight": 0.0,
    "perturbation_noise_weight": 0.0,
    "rescale_noise_scheduler_to_zero_terminal_snr": false,
    "force_v_prediction": false,
    "force_epsilon_prediction": false,
    "min_noising_strength": 0.0,
    "max_noising_strength": 1.0,
    "noising_weight": 0.0,
    "noising_bias": 0.5,
    "unet": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 10000,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": 1e-05,
        "weight_dtype": "BFLOAT_16"
    },
    "prior": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 10000,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "text_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": 200,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": 1e-05,
        "weight_dtype": "BFLOAT_16"
    },
    "text_encoder_layer_skip": 0,
    "text_encoder_2": {
        "__version": 0,
        "model_name": "",
        "train": false,
        "stop_training_after": 30,
        "stop_training_after_unit": "EPOCH",
        "learning_rate": null,
        "weight_dtype": "BFLOAT_16"
    },
    "text_encoder_2_layer_skip": 0,
    "vae": {
        "__version": 0,
        "model_name": "stabilityai/sdxl-vae",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "FLOAT_32"
    },
    "effnet_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder_text_encoder": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "decoder_vqgan": {
        "__version": 0,
        "model_name": "",
        "train": true,
        "stop_training_after": null,
        "stop_training_after_unit": "NEVER",
        "learning_rate": null,
        "weight_dtype": "NONE"
    },
    "masked_training": false,
    "unmasked_probability": 0.1,
    "unmasked_weight": 0.1,
    "normalize_masked_area_loss": false,
    "embeddings": [
        {
            "__version": 0,
            "model_name": "",
            "train": true,
            "stop_training_after": null,
            "stop_training_after_unit": "NEVER",
            "token_count": 1,
            "initial_embedding_text": "*",
            "weight_dtype": "FLOAT_32"
        }
    ],
    "embedding_weight_dtype": "FLOAT_32",
    "lora_model_name": "",
    "lora_rank": 16,
    "lora_alpha": 1.0,
    "lora_weight_dtype": "FLOAT_32",
    "optimizer": {
        "__version": 0,
        "optimizer": "ADAFACTOR",
        "adam_w_mode": false,
        "alpha": null,
        "amsgrad": false,
        "beta1": null,
        "beta2": null,
        "beta3": null,
        "bias_correction": false,
        "block_wise": false,
        "capturable": false,
        "centered": false,
        "clip_threshold": 1.0,
        "d0": null,
        "d_coef": null,
        "dampening": null,
        "decay_rate": -0.8,
        "decouple": false,
        "differentiable": false,
        "eps": 1e-30,
        "eps2": 0.001,
        "foreach": false,
        "fsdp_in_use": false,
        "fused": false,
        "fused_back_pass": false,
        "growth_rate": null,
        "initial_accumulator_value": null,
        "is_paged": false,
        "log_every": null,
        "lr_decay": null,
        "max_unorm": null,
        "maximize": false,
        "min_8bit_size": null,
        "momentum": null,
        "nesterov": false,
        "no_prox": false,
        "optim_bits": null,
        "percentile_clipping": null,
        "relative_step": false,
        "safeguard_warmup": false,
        "scale_parameter": false,
        "stochastic_rounding": false,
        "use_bias_correction": false,
        "use_triton": false,
        "warmup_init": false,
        "weight_decay": 0.0
    },
    "optimizer_defaults": {
        "ADAFACTOR": {
            "__version": 0,
            "optimizer": "ADAFACTOR",
            "adam_w_mode": false,
            "alpha": null,
            "amsgrad": false,
            "beta1": null,
            "beta2": null,
            "beta3": null,
            "bias_correction": false,
            "block_wise": false,
            "capturable": false,
            "centered": false,
            "clip_threshold": 1.0,
            "d0": null,
            "d_coef": null,
            "dampening": null,
            "decay_rate": -0.8,
            "decouple": false,
            "differentiable": false,
            "eps": 1e-30,
            "eps2": 0.001,
            "foreach": false,
            "fsdp_in_use": false,
            "fused": false,
            "fused_back_pass": false,
            "growth_rate": null,
            "initial_accumulator_value": null,
            "is_paged": false,
            "log_every": null,
            "lr_decay": null,
            "max_unorm": null,
            "maximize": false,
            "min_8bit_size": null,
            "momentum": null,
            "nesterov": false,
            "no_prox": false,
            "optim_bits": null,
            "percentile_clipping": null,
            "relative_step": false,
            "safeguard_warmup": false,
            "scale_parameter": false,
            "stochastic_rounding": false,
            "use_bias_correction": false,
            "use_triton": false,
            "warmup_init": false,
            "weight_decay": 0.0
        }
    },
    "sample_definition_file_name": "training_samples/samples.json",
    "samples": null,
    "sample_after": 400,
    "sample_after_unit": "STEP",
    "sample_image_format": "JPG",
    "samples_to_tensorboard": false,
    "non_ema_sampling": false,
    "backup_after": 3,
    "backup_after_unit": "EPOCH",
    "rolling_backup": false,
    "rolling_backup_count": 3,
    "backup_before_save": true,
    "save_after": 1,
    "save_after_unit": "EPOCH",
    "save_filename_prefix": "ctoon_sdxl_base_everclear"
}

Concepts.json

[
    {
        "__version": 0,
        "image": {
            "__version": 0,
            "enable_crop_jitter": false,
            "enable_random_flip": false,
            "enable_fixed_flip": false,
            "enable_random_rotate": false,
            "enable_fixed_rotate": false,
            "random_rotate_max_angle": 0.0,
            "enable_random_brightness": false,
            "enable_fixed_brightness": false,
            "random_brightness_max_strength": 0.0,
            "enable_random_contrast": false,
            "enable_fixed_contrast": false,
            "random_contrast_max_strength": 0.0,
            "enable_random_saturation": false,
            "enable_fixed_saturation": false,
            "random_saturation_max_strength": 0.0,
            "enable_random_hue": false,
            "enable_fixed_hue": false,
            "random_hue_max_strength": 0.0,
            "enable_resolution_override": false,
            "resolution_override": "512"
        },
        "text": {
            "__version": 0,
            "prompt_source": "sample",
            "prompt_path": "C:/ctoon1024-newest2024/JPEG/",
            "enable_tag_shuffling": false,
            "tag_delimiter": ",",
            "keep_tags_count": 1
        },
        "name": "zxc woman",
        "path": "C:/ctoon",
        "seed": -776527006,
        "enabled": true,
        "include_subdirectories": false,
        "image_variations": 1,
        "text_variations": 1,
        "repeats": 40.0,
        "loss_weight": 1.0
    },
    {
        "__version": 0,
        "image": {
            "__version": 0,
            "enable_crop_jitter": true,
            "enable_random_flip": true,
            "enable_fixed_flip": false,
            "enable_random_rotate": false,
            "enable_fixed_rotate": false,
            "random_rotate_max_angle": 0.0,
            "enable_random_brightness": false,
            "enable_fixed_brightness": false,
            "random_brightness_max_strength": 0.0,
            "enable_random_contrast": false,
            "enable_fixed_contrast": false,
            "random_contrast_max_strength": 0.0,
            "enable_random_saturation": false,
            "enable_fixed_saturation": false,
            "random_saturation_max_strength": 0.0,
            "enable_random_hue": false,
            "enable_fixed_hue": false,
            "random_hue_max_strength": 0.0,
            "enable_resolution_override": false,
            "resolution_override": "512"
        },
        "text": {
            "__version": 0,
            "prompt_source": "concept",
            "prompt_path": "C:/90000reg/woman-caption-single-word.txt",
            "enable_tag_shuffling": false,
            "tag_delimiter": ",",
            "keep_tags_count": 1
        },
        "name": "Woman",
        "path": "C:/98000reg-images/",
        "seed": -752754556,
        "enabled": true,
        "include_subdirectories": true,
        "image_variations": 1,
        "text_variations": 1,
        "repeats": 0.013,
        "loss_weight": 1.0
    }
]

mx commented 4 months ago

Well, if all the dtypes are BF16, it probably isn't anything coming from quantization differences in the conversion vs the final safetensors output. Not really sure where to go from here.

311-code commented 4 months ago

I'm just converting the backup file to a .safetensor like this and it's all good.

convert

It puts the .safetensor under \Comfyui\outputs\checkpoints. Then I use that in SD Forge.

Heasterian commented 4 months ago

Small thing I found out meanwhile. Safetensors yaml file is saved without EMA use_ema: false Can lack of ema be cause of this issue?

311-code commented 4 months ago

Not too sure but I somehow got EMA and batch size 2 to work on 24gb so I'll give it a go next time I train and compare.

pauldenoiser commented 2 months ago

Hello @brentjohnston, I'm curious about the quality you are talking about. Could you please, provide some images of the trained person and some generations?

BTW, I'm going to test your configuration, thanks for sharing.

311-code commented 2 months ago

Hello @brentjohnston, I'm curious about the quality you are talking about. Could you please, provide some images of the trained person and some generations?

BTW, I'm going to test your configuration, thanks for sharing.

Sorry for the delay. I had updated my original post with some images as an example. Hopefully you can see it? Btw I think my config may not be great for larger datasets. It needs some adjustments in the weight decay area. If you find better settings please let me know. I will post more images soon, but I don't really have other models to share at the moment.

I would recommend just copying the Onetrainer backup file's entire folder to Comyui/models/diffusers, and the .safetensor to ComfyUI\models\checkpoints and then search for a diffusersloader node and a load checkpoint node. Load the two models, then setup two ksamplers with your clip, vae and all that and compare on the same seed. You should see the difference.

So then what I do is just connect a checkpointsave node to the diffusersloader node and it outputs a nearly perfect .safetensor conversion to ComfyUI\output\checkpoints

Nerogar / OneTrainer