Closed orfeas-k closed 5 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5591.
This message was autogenerated
Regarding this dashboard, the spec described that we should have a dashboard that presents the "Uptime in % during the past 5 minutes". This would look like this:
However, this visualization has the following limitation: It shows the last value available that it received. So if the charm stops providing the metric completely (e.g. someone did a juju scale-application <app> 0
), then it will continue to show 100% which is deceiving. And this is not fixed even when mapping Null
, NaN
or NoValue
to 0. We had met the same limitation in the case of katib-controller where the controller stopped emitting current trials
metric when it had no trials.
In the case of Katib-controller, we went with a time series visualization together with mapping noValue
to 0. I tried this here and it works the same way. However, in this case we have more than two graph lines under the same panel which means that one charm's line is hidden under the other. That means that if two charms have an issue and start emitting 0 values (or no values) at the same exact time, then it will not be visible from the graph (without clicking on each application one by one) that it's more than one application that is down.
For example, in the visualization, if there was another app that went down at the same time with seldon-controller-manager (blue line), it wouldn't be visible. To summarize this, we cannot see how many lines of each graph line (thus up or down).
In order to resolve the above limitation, we went with a "state timeline" visualisation (the name sounds like a good fit,right?) and instead of showing an "Uptime in % during the past 5 minutes", we 'll be showing the applications' up metric each given time, interpreted as Up(1) or Down(0). This way, the user can see the state of each charm over the time.
Context
Implement a grafana dashboard that will show the charm state for all kubeflow charms that provide metrics. This has been specced out here, with the difference that we will deploy it using
kubeflow-dashboard
charm to simplify its deployment.What needs to get done
Add the generic grafana dashboard as part of kubeflow-dashboard charm.
Definition of Done
There is the grafana dashboard.