Open QuantumEnigmaa opened 3 months ago
I didn't find an actual equivalent for the cluster_operator_cluster_create_transition
and cluster_operator_cluster_update_transition
metrics. The only capi ones that are similar are the capi_cluster_created
and capi_cluster_status_condition_last_transition_time
ones but they don't really match.
@giantswarm/team-turtles would you have any idea how those could be mapped to capi metrics?
@giantswarm/team-turtles would you have any idea how those could be mapped to capi metrics?
As far as I know there's no metric in the capi controllers that holds the time spent to create or upgrade a cluster.
I guess cluster creation time could be computed as the time the cluster had the metric capi_cluster_status_phase{phase="Provisioning"}
= 1.
For the updates it's a bit dicey, we would need to check the status subresource of capi resources (cluster, kubeadmcontrolplane, machinepools) to be sure.
Not sure if @nprokopic or @njuettner have better ideas.
After meeting together with @njuettner, we came out with this solution :
cluster-api-events
controller.cluster_api_events_cluster_update_transition
and cluster_api_events_cluster_create_transition
aggregation:giantswarm:cluster_transition_create
, aggregation:giantswarm:cluster_transition_update
and aggregation:giantswarm:cluster_release_version
)Please @njuettner don't hesitate to correct me if I wrote some :)
Some dashboards accessible from Grafana Cloud such as the Clusters or the Customers ones are missing all data related to the CAPI clusters.
After investigating a bit, I found out this is due to the fact that the metrics used in the recording rules sent to grafana cloud (and thus used in those dahboards) are not present on CAPI clusters which have their own equivalent ones.
For example, on vintage clusters there's the
cluster_service_cluster_info
metric while to have the same output on CAPI clusters one needs to use thecapi_cluster_info
metric. However, the issue here is that there's no way to get the capi cluster release version as there's norelease
label in thecapi_cluster_info
metric.We thus need to update the grafana cloud recording rules in the prometheus-rules repo in order to cover both vintage and CAPI clusters.