Is it possible to use quantized models?

evilsocket / cake

Distributed LLM and StableDiffusion inference for mobile, desktop and server.

Other

2.44k stars 127 forks source link

Is it possible to use quantized models? #22

Closed ManuXD32 closed 1 month ago

ManuXD32 commented 1 month ago

Firs of all, I wanna thank you for your hard work, I love this project and I thinks it's awesome to be able to handle inference on different devices. As for me, the point in splitting a model among different devices, lays in my current RAM limitations, so I guess it would have much more sense to be able to use quantized versions of the big models.

evilsocket commented 1 month ago

yes

ManuXD32 commented 1 month ago

yes

And is it possible to split them? I've been trying with Mistral Nemo but I get this error all the time:

RUST_BACKTRACE=1 cake-split-model --model-path model/Mistral-Nemo-Instruct-2407-Q5_K_M.gguf --topology topology.yml --output output/

thread 'main' panicked at cake-split-model/src/main.rs:149:40: can't load index: Not a directory (os error 20)

Stack backtrace: 0: 1: 2: 3: 4: 5: __libc_start_main 6: stack backtrace: note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace. Aborted

I have also tried with other models getting the same error.

evilsocket commented 1 month ago

You are using a GGUF file, which is not supported by Cake. Only safetensors.

ManuXD32 commented 1 month ago

You are using a GGUF file, which is not supported by Cake. Only safetensors.

Are there plans to support gguf format?

evilsocket commented 1 month ago

possibly at some point, i work on this on my free time so i won't commit to a specific timeline

ManuXD32 commented 1 month ago

possibly at some point, i work on this on my free time so i won't commit to a specific timeline

Okay!! Thanks for your effort.