carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator
MIT License
740 stars 201 forks source link

Cluster CPU Use not being shown accurately #93

Closed Ashtoruin closed 1 month ago

Ashtoruin commented 3 years ago

https://github.com/carlosedp/cluster-monitoring/blob/4657fcd62c556cc61ac3a2565df5e308cd04a1d6/grafana-dashboards/kubernetes-cluster-dashboard.json#L250

I was running a single BOINC container and when I was monitoring htop on that node I was using less than 50% always and it was averaging around 30-40% however the dashboard was showing 80-100% usage. I don't really know of a better way to do this, but it doesn't seem accurate unless I'm missing something.

carlosedp commented 3 years ago

When you mean the usage, it's the node CPU usage or the POD usage in the dashboard?

Ashtoruin commented 3 years ago

The cluster CPU usage metric on the cluster monitoring dashboard basically shows as 80-100% usage for a worker node when it has a pod running on it. In reality that worker node is using about 1 core total, with spikes up to about 1.5 cores occasionally so about 30-40% usage at most. (Which I verified using htop after making sure the pod wasn't using all 4 cores)

magikmw commented 3 years ago

@Ashtoruin A convention used by Linux is that each fully used core represents value 1.0 or 100%. So two fully utilized cores would show as 2.0, or 200% etc.

This would align with your case of using one full core out of 4 + some change - kubernetes eats 100% and the system also has some work to do that would go up to 150% or (slashed by 4) 30-40% of total capacity. top/htop utilities indicate that with their load average: 0.52, 0.58, 0.59 lines.

I'm speculating, but if you managed to run two boinc containers on that one node grafana dashboard should show usage of 200% due to maxing out two cores, while the system would show 50-60% usage.