VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
MIT License
2.04k stars 132 forks source link

What does the num_cfg do? #65

Open MoonBlvd opened 1 week ago

MoonBlvd commented 1 week ago

Thank you for the great work! I'm trying to understand the detail and found the "num_cfg" is a bit confusing to me. In pipeline code there is

num_cfg = 2 if use_img_guidance else 1

and then when generating there is:

latents = torch.cat([latents]*(1+num_cfg), 0).to(dtype)

and after it's generated there is

samples = samples.chunk((1+num_cfg), dim=0)[0]

Why do we generate 3 and only take the first sample?

Thank you for your help!

staoxiao commented 1 week ago

Thanks for your attention to our work! We apply the Classifier-free Guidance, which will use other guidance (num_cfg) to improve the quality of images. The implementation is following instructpix2pix: https://arxiv.org/pdf/2211.09800 (see equation 3)

staoxiao commented 1 week ago

The method to use these cfg: https://github.com/VectorSpaceLab/OmniGen/blob/main/OmniGen/model.py#L363-L370