Open IDGallagher opened 2 months ago
Thanks for your suggestions! this is something I'm experimenting too but I want to try a slightly different approach.
The expensive bit is the image encoding itself. That can be easily solved by batching the encoder, once encoded I would keep all the embeds in the regular ram, ready to be used. That keeps the code pretty simple and doesn't touch the image encoder itself (which is important because we need to keep it aligned with comfyui's updates)
Once we enter the diffusion process everything is moved to the vram, instead of doing that we still keep them into regular ram and move them to vram 16 at the time (or whatever is the context window).
Some memory optimization ideas in this PR: