zmwv823 commented 2 months ago

Add progress_bar in ComfyUI:

pipeline_fill_sd_xl.py

from comfy.utils import ProgressBar
......
Add line-474: 
ComfyUI_ProgressBar = ProgressBar(int(num_inference_steps))
......
Add line-550: 
progress_bar.update()
ComfyUI_ProgressBar.update(1) # line-550
yield latents_to_rgb(latents)
......

Minimum VRAM:

With 'cpu_offload': 5.6GB.

GiusTex commented 2 months ago

Sorry, what file are you referring to? Nodes.py ends around line 300, while you start much more later (line-474); the progress line should also be in this file, right? You can also open a pull request, so you can directly edit the right files

zmwv823 commented 2 months ago

Sorry, what file are you referring to? Nodes.py ends around line 300, while you start much more later (line-474); the progress line should also be in this file, right? You can also open a pull request, so you can directly edit the right files

lol, file name is above the code block.

pipeline_fill_sd_xl.py

GiusTex commented 2 months ago

🙈 You are completely right, omg

GiusTex commented 2 months ago

Done, thanks! I still thinks it would be cool if you opened a pull request, this way you would show up as a contributor

zmwv823 commented 2 months ago

Done, thanks! I still thinks it would be cool if you opened a pull request, this way you would show up as a contributor

LOL, not needed, it's just a slightly change.

zmwv823 commented 2 months ago

🙈 You are completely right, omg

By the way diffusers folder in comfyui is ComfyUI_windows_portable\ComfyUI\models\diffusers. Unet is part of diffusers model. I also misunderstood this before.

GiusTex commented 2 months ago

And vram usage added too. I got 8,4, I don't know how you had it so low, but I reported both values

GiusTex commented 2 months ago

By the way diffusers folder in comfyui is ComfyUI_windows_portable\ComfyUI\models\diffusers.

I knew that the unet is a part of a diffusion model, but since comfyui consider diffusers and unet folders the same, and anyway it doesn't have nodes that load diffusion folders, only files, I chose to use the unet folder (also cause it became more famous with flux unet)

GiusTex commented 2 months ago

Now the choice would be whether to gather all the models (main model, vae, controlnet) in the diffusers folder, or leave them where they are, but in the end there would be no advantage in moving the files, it would be a conceptual thing

zmwv823 commented 2 months ago

And vram usage added too. I got 8,4, I don't know how you had it so low, but I reported both values

Like comfyui official procedure:

Load one model to vram to process special task, after done move this model to cpu (offload).

The order should be clip (text_encoder)--->vae (image process)--->controlnet + unet (this two model will loop use for processing in generation in Diffusers-Image-Outpaint project)--->vae (decode latents).

It's a standard lowvram mode, but comfyui official nodes is more powerful, it can even load part of one model (such as unet) to decrease vram.

GiusTex commented 2 months ago

I tried loading single unet files into the pipeline

The Pipeline

``` pipe = StableDiffusionXLFillPipeline.from_pretrained( f"{model_path}", torch_dtype=torch.float16, vae=vae, controlnet=controlnet_model, variant="fp16" ).to("cuda") ```

but they didn't work, as I said comfyui seems to load only files not folders, unless I missed the right load diffusion etc nodes, so:

I have to load altogheter model, controlnet and vae; I tried removing the vae to load it later and decode separately but it didn't work (the pipeline seems to need the vae), and I think that if I don't load it into vram I'll get a "models on different platforms" error). I also tried adding a generator in the generator to see if I can output latents, but it didn't work;
now I removed state_dict, model_file after defining the controlnet model, instead of waiting after the generation.

This is the generator:

The Generator

``` (prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, ) = pipe.encode_prompt(final_prompt, "cuda", True) generated_images = list(pipe( prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, negative_pooled_prompt_embeds=negative_pooled_prompt_embeds, image=cnet_image, num_inference_steps=steps )) ``` memory clean: ``` del pipe, vae, model, controlnet_model, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds gc.collect() torch.cuda.empty_cache() torch.cuda.ipc_collect() ```

So... I don't know how to get 5,6 gb of vram like you

zmwv823 commented 2 months ago

I tried loading single unet files into the pipeline

The Pipeline but they didn't work, as I said comfyui seems to load only files not folders, unless I missed the right load diffusion etc nodes, so:

I have to load altogheter model, controlnet and vae; I tried removing the vae to load it later and decode separately but it didn't work (the pipeline seems to need the vae), and I think that if I don't load it into vram I'll get a "models on different platforms" error). I also tried adding a generator in the generator to see if I can output latents, but it didn't work;

now I removed state_dict, model_file after defining the controlnet model, instead of waiting after the generation, but for some reason it takes more vram now, I don't understand if I made something wrong somewhere but I think I reverted most of the tests;

This is the generator:

The Generator
(prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) = pipe.encode_prompt(final_prompt, "cuda", True)

generated_images = list(pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
image=cnet_image,
num_inference_steps=steps
))
and after that, I remove the model files:
del pipe, vae, model, controlnet_model, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds 
gc.collect()
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
but as I said for some reson now the memory doesn't get cleaned

So... I don't know how to get 5,6 gb of vram like you

Here is an article about how to optimize sdxl pipelie:

https://www.felixsanz.dev/articles/ultimate-guide-to-optimizing-stable-diffusion-xl#model-cpu-offload

But i don't think it's a good idea to spend too much time on it. For those who familar with comfy code (unfortunately i'm not), it can be easily intergrated into comfyui with official nodes (clip、ksampler、etc).

For more control, the `pipeline_fill_sd_xl` need modified.

Two input entrance: model_init-model_loader, generated_images-generation. All required inputs and process order are here, they can be modified.

model init:

def __init__( #line-80
    self,
    vae: AutoencoderKL,
    text_encoder: CLIPTextModel,
    text_encoder_2: CLIPTextModelWithProjection,
    tokenizer: CLIPTokenizer,
    tokenizer_2: CLIPTokenizer,
    unet: UNet2DConditionModel,
    controlnet: ControlNetModel_Union,
    scheduler: KarrasDiffusionSchedulers,
    force_zeros_for_empty_prompt: bool = True,
):

generation:

def __call__( #line-383
    self,
    prompt_embeds: torch.Tensor,
    negative_prompt_embeds: torch.Tensor,
    pooled_prompt_embeds: torch.Tensor,
    negative_pooled_prompt_embeds: torch.Tensor,
    image: PipelineImageInput = None,
    num_inference_steps: int = 8,
    guidance_scale: float = 1.5,
    controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
    keep_model_loaded: bool = True,
):

GiusTex commented 2 months ago

Mine is not an sdxl pipeline, but a fill one (StableDiffusionXLFillPipeline)

GiusTex commented 2 months ago

I already searched the past days on huggingface the sdxl pipelines and details on the fill one there aren't (or I couldn't find them), and what works for sdxl didn't work for me on sdxlfill

GiusTex commented 2 months ago

You can always do a pull request since I don't understand how to implement, if it works, what you say

GiusTex commented 2 months ago

I'll look into model_init-model_loader, generated_images-generation later, thanks for pointing them out. I still think it would be quicker for you to open the p.r. instead of guide me, sorry

GiusTex commented 2 months ago

You can still pass image and mask to other nodes, it's just that you don't have the direction inputs, like in the outpaint space by fffiloni, or you wanted smth else?

GiusTex commented 2 months ago

Maybe I could add an option to choose between alignment and directions inputs

GiusTex commented 2 months ago

The image padding node is easier and enough for use.

You lose the cool alignment and auto-resize options though, and the image needs to be passed to other nodes before being fed to the diffuser-outpaint node, so I don't think it's easily skippable (unless you're ok with having more nodes instead of just one)

GiusTex commented 2 months ago

After:

(prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) = pipe.encode_prompt(final_prompt)

I tried deleting: del pipe.text_encoder, pipe.text_encoder_2, pipe.tokenizer, pipe.tokenizer_2 (no vram change) del pipe.unet, pipe.control_net, pipe.vae, pipe.scheduler (got obvious error), this means del pipe.something works, and since pipe.text_encoder etc didn't change anything I think the min vram usage is 8,3 gb, with the models I used, unless there are other ways. I tried deleting text_encoder/s and tokenizer/s even in __init__ and __call__ but the prior didn't changed anything and the latter gave error

GiusTex / ComfyUI-DiffusersImageOutpaint