LlamaEdge / rag-api-server

A RAG API server written in Rust following OpenAI specs
https://llamaedge.com/docs/user-guide/server-side-rag/quick-start
Apache License 2.0
21 stars 7 forks source link

Can't find threads option for multi-core CPU setup #20

Closed zen1001 closed 1 month ago

zen1001 commented 1 month ago

By default ggml plugin use only 4 cores, anyway I have 28 cores and pure llama.cpp can use it. As i see plugin was updated:

[WASI-NN] ggml backend: Bump llama.cpp to b2963. Support llama.cpp options: threads: the thread number for inference.

but how to use threads option in a rag-api-server?

zen1001 commented 1 month ago

I make my local test and it's should be changed in llama-core crate first.