drboog / ProFusion

Code for Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
Apache License 2.0
466 stars 28 forks source link

ProFusion

ProFusion (with an encoder pre-trained on a large dataset such as CC3M) can be used to efficiently construct customization dataset, which can be used to train a tuning-free customization assistant (CAFE).

Given a testing image, the assistant can perform customized generation in a tuning-free manner. It can take complex user-input, generate text explanation and elaboration along with image, without any fine-tuning.


examples

Results from CAFE



examples

Results from CAFE



Code for Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach.


examples

Results from ProFusion


ProFusion is a framework for customizing pre-trained large-scale text-to-image generation models, which is Stable Diffusion 2 in our examples.

framework

Illustration of the proposed ProFusion


With ProFusion, you can generate infinite number of creative images for a novel/unique concept, with single testing image, on single GPU (~20GB are needed when fine-tune with batch size 1).


examples

Results from ProFusion


Example

Train Your Own Encoder

If you want to train a PromptNet encoder for other domains, or on your own dataset.

Citation

@article{zhou2023enhancing,
  title={Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach},
  author={Zhou, Yufan and Zhang, Ruiyi and Sun, Tong and Xu, Jinhui},
  journal={arXiv preprint arXiv:2305.13579},
  year={2023}
}