Is it possible to run the ggufs on cpu like llama?

city96 / ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models

Apache License 2.0

1.04k stars 68 forks source link

Is it possible to run the ggufs on cpu like llama? #10

Open TingTingin opened 3 months ago

TingTingin commented 3 months ago

It would be nice to be able to run ggufs on cpu like you can with llama gguf I don't know of what the speed would look like but could be better for people with low vram gpus

Also i haven't looked at the code but i believe ggufs have more efficient mem allocation built in i.e if you choose to split the model between gpu and cpu it wont be as bad as typical mem overflow from pytorch if its possible for this to be implemented it would also be a nice feature to have for those with low vram gpus

city96 commented 3 months ago

We're only using gguf as a storage medium here without the surrounding llama.cpp library, so we would have to rely on the ComfyUI lowvram mode (which will need some extra changes to work).