It would be nice to be able to run ggufs on cpu like you can with llama gguf I don't know of what the speed would look like but could be better for people with low vram gpus
Also i haven't looked at the code but i believe ggufs have more efficient mem allocation built in i.e if you choose to split the model between gpu and cpu it wont be as bad as typical mem overflow from pytorch if its possible for this to be implemented it would also be a nice feature to have for those with low vram gpus
We're only using gguf as a storage medium here without the surrounding llama.cpp library, so we would have to rely on the ComfyUI lowvram mode (which will need some extra changes to work).
It would be nice to be able to run ggufs on cpu like you can with llama gguf I don't know of what the speed would look like but could be better for people with low vram gpus
Also i haven't looked at the code but i believe ggufs have more efficient mem allocation built in i.e if you choose to split the model between gpu and cpu it wont be as bad as typical mem overflow from pytorch if its possible for this to be implemented it would also be a nice feature to have for those with low vram gpus