Closed DominoUB closed 2 weeks ago
same issue here as well, the default dual clip loader with fp8 weight worked fine, I think what it needs is to unload the clip model and reload it after a prompt change
Will most likely be fixed by this PR: https://github.com/city96/ComfyUI-GGUF/pull/92 (will merge tomorrow but you can test it now if you know how to switch branches).
Will most likely be fixed by this PR: #92 (will merge tomorrow but you can test it now if you know how to switch branches).
Thanks much appreciated
Will most likely be fixed by this PR: #92 (will merge tomorrow but you can test it now if you know how to switch branches).
Just tested the ops branch, yeah the issue was gone. Ram usage was lower than before, vram usage same as before, however, inference speed is tiny bit slower not sure what's causing it.
Despite this, I feel like it's now more optimised and polished, no more reloading of model after a weight/prompt changed, everything just started instantly which is great. Good stuff!
I would say this is resolved now, works great, thanks so much.
When using the quant models, and changing the prompt, it caps out both my VRAM and RAM unless I unload the model between prompt changes.
Workflow Steps to replicate
Work around
Hardware: 4080, 32GB DDR4 RAM, 10400kf, SATA Samsung Evo 860 SSD.