comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
53.03k stars 5.62k forks source link

Generating photo 3 of 4 #2134

Open 42degrees opened 10 months ago

42degrees commented 10 months ago

I'm running a large batch of images (several hundred thousand) using the API. I'm running 4 images for each unique set of variables. All image groups are using the same random seed. If I run the workflow with a batch size of 4 I get four distinct images. If I run the workflow with a batch size of 3 I get the same first 3 images out of the batch, which is great, but what I would love to do is generate the first image in the group now, the second image in the group later, and so on.

So, my question is this, can I generate just the 3rd image generated by a given workflow and skip images 1, 2, and 4?

What I was thinking is that I could generate say 100,000 runs for image 1, then I could use that single image four times to test and create the app that goes around the 100,000 images. While I'm doing that I would generate images 2, then 3, then 4 and build the total images up over time (I estimate that it will take weeks to generate all the images I want).

I'm assuming that under the hood ComfyUI generates a series of images using a random function seeded with the passed seed. Unfortunately, the secondary seeds are not stored in the JSON in the image (at least, not that I've been able to find). I figure I can dig through the code for ComfyUI and find where it's generating the seed and manufacture that seed generation progress myself to get the secondary seeds, but before I did that asking for an easier, and existing, solution would be a good idea.

Thanks!

ltdrdata commented 10 months ago

For batch latents, since the entire latent is initialized at once, if you want to obtain the i-th latent in a batch of size n, you need to create a batch of size n with the same seed for initialization and then extract the i-th latent.

While you can reproduce the noise for a specific latent, there is no specific seed associated with that latent.

42degrees commented 10 months ago

For batch latents, since the entire latent is initialized at once, if you want to obtain the i-th latent in a batch of size n, you need to create a batch of size n with the same seed for initialization and then extract the i-th latent.

While you can reproduce the noise for a specific latent, there is no specific seed associated with that latent.

I'm not sure I understand what you are saying. The "latent" is the random image that is used as a seed image to process through SDXL, right? Are you saying that it creates a random image of size n width x n height and then cuts that up to generate the individual images? Is that the "entire latent"?

ltdrdata commented 10 months ago

In the case of t2i, for image generation, noise is applied to an empty latent through a seed, and the image is created through the diffusion process. (More precisely, noise is not directly applied to the latent batch; instead, noise corresponding to the size of the latent batch is generated and used in the diffusion process.)

The issue arises when the empty latent has a batch size greater than 2.

In this scenario, for the first batch latent, noise is applied based on the specified seed. However, starting from the second batch latent, it applies noise not from a new seed but rather continues the noise from the first batch latent.

In other words, until the noise initialization of the first batch latent is completed, it is impossible to know how the noise of the second batch latent will turn out.

42degrees commented 10 months ago

In other words, until the noise initialization of the first batch latent is completed, it is impossible to know how the noise of the second batch latent will turn out.

Thank you for the clarification, I hadn't understood how the batches were generated. That makes perfect sense.

If you don't mind, I have a follow-up question. Running four images in a batch is faster than running four images individually, but I don't see the optimization in your description. It seems to me that it should take the same amount of time to generate noise by continuing to sample the initialized random algorithm as it would to seed a new random and use it to generate the continuing noise. Since it doesn't reload either the checkpoint or the LoRAs between images, that caching is the same or similar in both scenarios. Where is the time savings in that process?

ltdrdata commented 10 months ago

The part about reproducibility of noise from the seed is not about performance but rather about ensuring consistency. The time it takes to generate noise is negligible. The speed improvement with batches is mainly due to the reduction of various overheads and more efficient utilization on the GPU during the diffusion process.

42degrees commented 10 months ago

The part about reproducibility of noise from the seed is not about performance but rather about ensuring consistency. The time it takes to generate noise is negligible. The speed improvement with batches is mainly due to the reduction of various overheads and more efficient utilization on the GPU during the diffusion process.

Then it's too bad they didn't generate a new seed for each photo in the batch. It would make it a whole lot easier to pull one photo out (by setting the photo's seed). They could have even simply incremented the seed by one for each photo. Either way, if they documented that photo seed in the attached JSON, the file is now more useful by itself.

I wonder, is that being done by ComfyUI or by SDXL? Maybe I could write a new node to do that?

ltdrdata commented 10 months ago

The part about reproducibility of noise from the seed is not about performance but rather about ensuring consistency. The time it takes to generate noise is negligible. The speed improvement with batches is mainly due to the reduction of various overheads and more efficient utilization on the GPU during the diffusion process.

Then it's too bad they didn't generate a new seed for each photo in the batch. It would make it a whole lot easier to pull one photo out (by setting the photo's seed). They could have even simply incremented the seed by one for each photo. Either way, if they documented that photo seed in the attached JSON, the file is now more useful by itself.

I wonder, is that being done by ComfyUI or by SDXL? Maybe I could write a new node to do that?

FYI, the KSampler //Inspire node in Inspire Pack provides a feature that increments the seed for each latent in a batch while initializing noise.