Open naved001 opened 1 year ago
Oh, interesting. So it sounds like we should find a similar join that works for CPU/Memory?
I have settled on using the unless
operator which is used for intersection.
vector1 unless vector2 results in a vector consisting of the elements of vector1 for which there are no elements in vector2 with exactly matching label sets. All matching elements in both vectors are dropped.
'kube_pod_resource_request{unit="cores"} unless on(pod, namespace) kube_pod_status_unschedulable'
So, this will collect "cores" request for pods that are not unschedulable (not unschedulable == schedulable), and it uses (pod, namespace) to match between the two vectors since these won't have all the same labels.
And this works better than the old way, I no longer get 422 error code. The reason for that error code was that the old query resulted in a many-to-many match sometimes which is not allowed.
https://github.com/OCP-on-NERC/xdmod-openshift-scripts/blob/b534df90573263131a40e299289647562fe0f37b/openshift_metrics/openshift_prometheus_metrics.py#L24
This metric will gather cpu request by all pods regardless of if they are running or not.
So, if you had a pod that could not be scheduled we will still end up counting it's CPU requests.
I discovered this when I was trying to gather GPU usage data for the NERC openshift cluster, there was a pod that requested a GPU but it was never scheduled as the cluster does not have an active GPU.
One possible solution is to get an intersection like this: https://github.com/naved001/xdmod-openshift-scripts/blob/d75e06698961a5b9f4db0ac4e86f4e11b30a41a8/openshift_metrics/openshift_prometheus_metrics.py#L26
it worked when I queried GPU metrics, but when I applied this intersection for CPU and Memory I got a 422 error code from prometheus and thanos. :/