[Feature]: Monitoring for llama.cpp service

SkywardAI / kirin

APIs aggregator for inference, fine-tuning and build models.

https://skywardai.github.io/skywardai.io/

Apache License 2.0

5 stars 7 forks source link

[Feature]: Monitoring for llama.cpp service #170

Closed Aisuko closed 1 week ago

Aisuko commented 2 weeks ago

Contact Details(optional)

No response

What feature are you requesting?

Currently, we use llama.cpp as CPU inference accelerate solution. So, we need to know the CPU, memory and disk usage for this service. It already has the metric API, we can fetch the metrics and show it on the dashboard.

Aisuko commented 1 week ago

We can expose this to a Grafana service. This is important, it uses CPU resource to do the inference. So, we need to monitoring the cost of the inference.