Closed zen1001 closed 1 month ago
By default ggml plugin use only 4 cores, anyway I have 28 cores and pure llama.cpp can use it. As i see plugin was updated:
[WASI-NN] ggml backend: Bump llama.cpp to b2963. Support llama.cpp options: threads: the thread number for inference.
but how to use threads option in a rag-api-server?
I make my local test and it's should be changed in llama-core crate first.
By default ggml plugin use only 4 cores, anyway I have 28 cores and pure llama.cpp can use it. As i see plugin was updated:
[WASI-NN] ggml backend: Bump llama.cpp to b2963. Support llama.cpp options: threads: the thread number for inference.
but how to use threads option in a rag-api-server?