Currently, we use llama.cpp as CPU inference accelerate solution. So, we need to know the CPU, memory and disk usage for this service. It already has the metric API, we can fetch the metrics and show it on the dashboard.
We can expose this to a Grafana service. This is important, it uses CPU resource to do the inference. So, we need to monitoring the cost of the inference.
Contact Details(optional)
No response
What feature are you requesting?
Currently, we use llama.cpp as CPU inference accelerate solution. So, we need to know the CPU, memory and disk usage for this service. It already has the
metric
API, we can fetch the metrics and show it on the dashboard.