Closed lystrata closed 9 months ago
On an 8 Core system, I see that onprem pegs 4 coress @100% while the 0ther 4 cores are relatively inactive. Is this an intentional design limit? Or can we enable the code to make use of more cores?
The default is to use half of your CPUs and is set by llama-cpp-python. You can change it by supplying the n_threads parameter to LLM:
llama-cpp-python
n_threads
LLM
from onprem import LLM llm = LLM(n_threads=desired_number_of_cores)
On an 8 Core system, I see that onprem pegs 4 coress @100% while the 0ther 4 cores are relatively inactive. Is this an intentional design limit? Or can we enable the code to make use of more cores?