Open CrazyJson opened 6 months ago
You need a gguf model file to use llama.cpp, not safetensors.
Thanks, I understand llama.cpp is used to load the quantized gguf model. One more question, which parameter in the sample code is used to enable the use of the local GPU, and how to choose which local gpu to use
i have the same problem, i download and install Cuda12 , but not use my GPU still use RAM!
do you have the cuda toolkit installed? You need that to supply the CUDA runtime packages
yes i did
I have an RTX 4060 graphics card, how do I deploy a gpu version of the model with this project