DataCTE / SDXL-Training-Improvements

Apache License 2.0
39 stars 0 forks source link

SDXL Training with ZTSNR and NovelAI V3 Improvements

Most SDXL implementations use a maximum noise deviation (σ_max) of 14.6 [meaning that only 14.6% of the noise is removed at maximum] inherited from SD1.5/1.4, without accounting for SDXL's larger scale. Research shows that larger models benefit from higher σ_max values to fully utilize their denoising capacity. This repository implements an increased σ_max ≈ 20000.0 (as recommended by NovelAI research arXiv:2409.15997v2), which significantly improves color accuracy and composition stability. Combined with Zero Terminal SNR (ZTSNR) and VAE finetuning.

Current Status

Technical Implementation

1. Zero Terminal SNR (ZTSNR)

2. High-Resolution Coherence

3. VAE Improvements

Quick Start

Installation

git clone https://github.com/DataCTE/SDXL-Training-Improvements.git
cd SDXL-Training-Improvements
pip install -r requirements.txt

Basic Training

python src/main.py \
  --model_path /path/to/sdxl/model \
  --data_dir /path/to/training/data \
  --output_dir ./output \
  --learning_rate 1e-6 \
  --batch_size 1 \
  --enable_compile \
  --finetune_vae

Advanced Configuration

python src/main.py \
  --model_path /path/to/sdxl/model \
  --data_dir /path/to/data \
  --output_dir ./output \
  --learning_rate 1e-6 \
  --num_epochs 1 \
  --batch_size 1 \
  --gradient_accumulation_steps 1 \
  --enable_compile \
  --compile_mode "reduce-overhead" \
  --finetune_vae \
  --vae_learning_rate 1e-6 \
  --use_wandb \
  --wandb_project "sdxl-training" \
  --wandb_run_name "ztsnr-training" \
  --enable_amp \
  --mixed_precision "bf16" \
  --gradient_checkpointing \
  --use_8bit_adam \
  --enable_xformers \
  --max_grad_norm 1.0 \
  --adaptive_loss_scale \
  --kl_weight 0.1 \
  --perceptual_weight 0.1

Tag-Based CLIP Weighting

Configuration

python src/main.py \
  --min_tag_weight 0.1 \
  --max_tag_weight 3.0 \
  --character_weight 1.5 \
  --style_weight 1.2 \
  --quality_weight 0.8 \
  --setting_weight 1.0 \
  --action_weight 1.1 \
  --object_weight 0.9 \
  --tag_frequency_path "tag_frequencies.json" \
  --tag_embedding_cache "tag_embeddings.pt" \
  --dynamic_tag_weights \
  --tag_dropout 0.1

Weight Ranges

Class Weights

Dataset Format

Directory Structure

data_dir/
├── image1.png
├── image1.txt
├── image2.jpg
├── image2.txt
...

Tag Format

character_tags, style_tags, setting_tags, quality_tags

Example: 1girl, anime style, outdoor, high quality

Project Structure

See Project Structure Documentation for detailed component descriptions.

ComfyUI Integration

This repository includes custom ComfyUI nodes that implement the ZTSNR and NovelAI V3 improvements. The nodes can be found in src/inference/Comfyui-zsnrnode/.

Available Nodes

  1. ZSNR V-Prediction Node

    • Implements Zero Terminal SNR and V-prediction
    • Configurable σ_min and σ_data parameters
    • Resolution-aware scaling
    • Dynamic SNR gamma adjustment
    • Category: "conditioning"
  2. CFG Rescale Node

    • Advanced CFG rescaling methods
    • Multiple scaling algorithms
    • Configurable rescale multiplier
    • Category: "sampling"
  3. Laplace Scheduler Node

    • Laplace distribution-based noise scheduling
    • Configurable μ and β parameters
    • Optimized for SDXL's scale
    • Category: "sampling"

Installation

  1. Copy the src/inference/Comfyui-zsnrnode directory to your ComfyUI custom nodes folder:

    cp -r src/inference/Comfyui-zsnrnode /path/to/ComfyUI/custom_nodes/
  2. Restart ComfyUI to load the new nodes

Usage

The nodes will appear in the ComfyUI node browser under their respective categories:

Recommended workflow:

  1. Add ZSNR V-prediction node before your main sampling node
  2. Configure CFG rescaling if using high CFG values
  3. Optionally use the Laplace scheduler for improved noise distribution

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests for improvements.

License

Apache 2.0

Citation

@article{ossa2024improvements,
  title={Improvements to SDXL in NovelAI Diffusion V3},
  author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
  journal={arXiv preprint arXiv:2409.15997v2},
  year={2024}
}