OminiControl
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan,
Songhua Liu,
Xingyi Yang,
Qiaochu Xue,
and
Xinchao Wang
Learning and Vision Lab, National University of Singapore
Features
OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
-
Universal Control π: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
-
Minimal Design π: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
Quick Start
Setup (Optional)
- Environment setup
conda create -n omini python=3.10
conda activate omini
- Requirements installation
pip install -r requirements.txt
Usage example
- Subject-driven generation:
examples/subject.ipynb
- In-painting:
examples/inpainting.ipynb
- Canny edge to image, depth to image, colorization, deblurring:
examples/spatial.ipynb
Gradio app
To run the Gradio app for subject-driven generation:
python -m src.gradio.gradio_app
Guidelines for subject-driven generation
- Input images are automatically center-cropped and resized to 512x512 resolution.
- When writing prompts, refer to the subject using phrases like
this item
, the object
, or it
. e.g.
- A close up view of this item. It is placed on a wooden table.
- A young lady is wearing this shirt.
- The model primarily works with objects rather than human subjects currently, due to the absence of human data in training.
Generated samples
Subject-driven generation
Demos (Left: condition image; Right: generated image)
Text Prompts
- Prompt1: *A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'*
- Prompt2: *A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.*
- Prompt3: *In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.*
- Prompt4: *"On the beach, a lady sits under a beach umbrella with 'Omini' written on it. She's wearing this shirt and has a big smile on her face, with her surfboard hehind her. The sun is setting in the background. The sky is a beautiful shade of orange and purple."*
More results
* Try on:
* Scene variations:
* Dreambooth dataset:
Spaitally aligned control
- Image Inpainting (Left: original image; Center: masked image; Right: filled image)
- Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.
- Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.
-
Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Click to show
Prompt: *A light gray sofa stands against a white wall, featuring a black and white geometric patterned pillow. A white side table sits next to the sofa, topped with a white adjustable desk lamp and some books. Dark hardwood flooring contrasts with the pale walls and furniture.*
Models
Subject-driven control: |
Model |
Base model |
Description |
Resolution |
experimental / subject |
FLUX.1-schnell |
The model used in the paper. |
(512, 512) |
omini / subject_512 |
FLUX.1-schnell |
The model has been fine-tuned on a larger dataset. |
(512, 512) |
omini / subject_1024 |
FLUX.1-schnell |
The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) |
(1024, 1024) |
Spatial aligned control: |
Model |
Base model |
Description |
Resolution |
experimental / <task_name> |
FLUX.1 |
Canny edge to image, depth to image, colorization, deblurring, in-painting |
(512, 512) |
experimental / <task_name>_1024 |
FLUX.1 |
Supports higher resolution.(To be released) |
(1024, 1024) |
Limitations
- The model's subject-driven generation primarily works with objects rather than human subjects due to the absence of human data in training.
- The subject-driven generation model may not work well with
FLUX.1-dev
.
- The released model currently only supports the resolution of 512x512.
To-do
- [ ] Release the model for higher resolution (1024x1024).
- [ ] Release the training code.
Citation
@article{
tan2024omini,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}