Inference-only tiny reference implementation of SD3.5 and SD3 - everything you need for simple inference using SD3.5/SD3, excluding the weights files.
Contains code for the text encoders (OpenAI CLIP-L/14, OpenCLIP bigG, Google T5-XXL) (these models are all public), the VAE Decoder (similar to previous SD models, but 16-channels and no postquantconv step), and the core MM-DiT (entirely new).
Note: this repo is a reference library meant to assist partner organizations in implementing SD3.5/SD3. For alternate inference, use Comfy.
Download the following models from HuggingFace into models
directory:
This code also works for Stability AI SD3 Medium.
# Note: on windows use "python" not "python3"
python3 -s -m venv .sd3.5
source .sd3.5/bin/activate
# or on windows: venv/scripts/activate
python3 -s -m pip install -r requirements.txt
# Generate a cat using SD3.5 Large model (at models/sd3.5_large.safetensors) with its default settings
python3 sd3_infer.py --prompt "cute wallpaper art of a cat"
# Or use a text file with a list of prompts, using SD3.5 Large
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large.safetensors
# Generate from prompt file using SD3.5 Large Turbo with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large_turbo.safetensors
# Generate from prompt file using SD3.5 Medium with its default settings, at 2k resolution
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --width 1920 --height 1080
# Generate from prompt file using SD3 Medium with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3_medium.safetensors
Images will be output to outputs/<MODEL>/<PROMPT>_<DATETIME>_<POSTFIX>
by default.
To add a postfix to the output directory, add --postfix <my_postfix>
. For example,
python3 sd3_infer.py --prompt path/to/my_prompts.txt --postfix "steps100" --steps 100
To change the resolution of the generated image, add --width <WIDTH> --height <HEIGHT>
.
Optionally, use Skip Layer Guidance for potentially better struture and anatomy coherency from SD3.5-Medium.
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --skip_layer_cfg True
sd3_infer.py
- entry point, review this for basic usage of diffusion modelsd3_impls.py
- contains the wrapper around the MMDiTX and the VAEother_impls.py
- contains the CLIP models, the T5 model, and some utilitiesmmditx.py
- contains the core of the MMDiT-X itselfmodels
with the following files (download separately):
clip_l.safetensors
(OpenAI CLIP-L, same as SDXL/SD3, can grab a public copy)clip_g.safetensors
(openclip bigG, same as SDXL/SD3, can grab a public copy)t5xxl.safetensors
(google T5-v1.1-XXL, can grab a public copy)sd3.5_large.safetensors
or sd3.5_large_turbo.safetensors
or sd3.5_medium.safetensors
(or sd3_medium.safetensors
)The code included here originates from:
Check the LICENSE-CODE file.
Some code in other_impls
originates from HuggingFace and is subject to the HuggingFace Transformers Apache2 License