Can't find threads option for multi-core CPU setup

LlamaEdge / rag-api-server

A RAG API server written in Rust following OpenAI specs

Apache License 2.0

21 stars 7 forks source link

Closed zen1001 closed 1 month ago

zen1001 commented 1 month ago

By default ggml plugin use only 4 cores, anyway I have 28 cores and pure llama.cpp can use it. As i see plugin was updated:

[WASI-NN] ggml backend: Bump llama.cpp to b2963. Support llama.cpp options: threads: the thread number for inference.

but how to use threads option in a rag-api-server?

zen1001 commented 1 month ago

I make my local test and it's should be changed in llama-core crate first.