OOM when changing prompts while using Q8_0 and Q8_0 t5xxl models.

city96 / ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models

Apache License 2.0

723 stars 41 forks source link

OOM when changing prompts while using Q8_0 and Q8_0 t5xxl models. #89

Closed DominoUB closed 2 weeks ago

DominoUB commented 2 weeks ago

When using the quant models, and changing the prompt, it caps out both my VRAM and RAM unless I unload the model between prompt changes.

Workflow Steps to replicate

Generate an image.
Change part of the prompt.
Queue another image without unloading the model.

Work around

Generate an image
Clear the model cache
Change the prompt
Queue another image

Hardware: 4080, 32GB DDR4 RAM, 10400kf, SATA Samsung Evo 860 SSD.

cmcjas commented 2 weeks ago

same issue here as well, the default dual clip loader with fp8 weight worked fine, I think what it needs is to unload the clip model and reload it after a prompt change

city96 commented 2 weeks ago

Will most likely be fixed by this PR: https://github.com/city96/ComfyUI-GGUF/pull/92 (will merge tomorrow but you can test it now if you know how to switch branches).

cmcjas commented 2 weeks ago

Will most likely be fixed by this PR: #92 (will merge tomorrow but you can test it now if you know how to switch branches).

Thanks much appreciated

cmcjas commented 2 weeks ago

Will most likely be fixed by this PR: #92 (will merge tomorrow but you can test it now if you know how to switch branches).

Just tested the ops branch, yeah the issue was gone. Ram usage was lower than before, vram usage same as before, however, inference speed is tiny bit slower not sure what's causing it.

Despite this, I feel like it's now more optimised and polished, no more reloading of model after a weight/prompt changed, everything just started instantly which is great. Good stuff!

DominoUB commented 2 weeks ago

I would say this is resolved now, works great, thanks so much.