chrisgoringe / cg-mixed-casting

8 stars 0 forks source link

[ Question ] How much RAM is needed for this ? #11

Open JorgeR81 opened 2 months ago

JorgeR81 commented 2 months ago

First of all, Great Idea !

I haven't tried this yet, but I have some questions about RAM usage. 

The "vanilla" Comfy UI, seems to be upcasting to FP32, while the model it's loading, so it uses a lot of RAM, at the UNET loader stage. 

At the KSampler stage, RAM usage it's normal. But this it's still annoying, because I only have 32 GB RAM, so my page file increases massively, if I use Flux in FP16 or even FP8.

With GGUF models this does not happen. The models only load to RAM at the KSamper stage, and they don't go above 32 GB. 

So what do you think RAM usage it's going to be to create and use mixed models ?

chrisgoringe commented 2 months ago

It loads the model in whatever size (8 or 16 bit). In order to quantise it does need to upcast to 32, but it does that block by block.

So the pattern of RAM I see is that it goes steadily up to use whatever is available (I have 32Gb), and then plateaus as Python garbage collection kicks in.

So it should only need enough RAM to load the model and a little more.

If you have enough VRAM comfy will then push the quantised version to GPU, and dequantise when needed.

It then expands the quantised model on the GPU on the fly.