canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
102 stars 49 forks source link

Investigate which CKF charms provide metrics by default #855

Closed orfeas-k closed 5 months ago

orfeas-k commented 5 months ago

Context

In order to evaluate better the work for integrating CKF charms with Observability charms, we need to investigate which CKF charms provide metrics by default. The state right now with charms alerts is documented in https://github.com/canonical/bundle-kubeflow/issues/837.

At the same time, we will document also the metrics that those charms provide.

What needs to get done

Document for every charm:

  1. If it provides metrics by default
  2. If yes, what are those metrics?

Definition of Done

We know for all CKF charms what they expose.

syncronize-issues-to-jira[bot] commented 5 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5487.

This message was autogenerated

orfeas-k commented 5 months ago

admission-webhook

Code doesn't implement any metrics although there are some references in its go.mod and .sum files.

argo-controller

dex

envoy

istio-gateway

istio-pilot

jupyter-controller

jupyter-ui

Code doesn't implement any metrics.

katib-controller

katib-db-manager

Code doesn't implement any prometheus metrics.

katib-ui

Code doesn't implement any metrics.

kfp-api

Pipeline steps don't expose metrics by default. Feature requests:

kfp-metadata-writer

Code doesn't implement anything related to metrics https://github.com/kubeflow/pipelines/tree/master/backend/metadata_writer

kfp-persistence

Code doesn't implement any metrics. The only reference to metrics is about "metrics" provided from the application for exposing artifacts in the UI.

kfp-profile-controller

Code doesn't implement any metrics.

kfp-schedwf

Code doesn't implement any prometheus metrics.

kfp-ui

Code doesn't implement any prometheus metrics.

kfp-viewer

Code doesn't implement any prometheus metrics.

kfp-viz

Code doesn't implement any prometheus metrics.

knative-eventing & knative-serving

knative-operator

kserve-controller

kubeflow-dashboard

kubeflow-profiles

kfam

profiles

kubeflow-roles

There isn't an upstream app for this charm.

kubeflow-volumes

Code doesn't implement any metrics.

metacontroller

minio

mlmd

Code doesn't implement any metrics.

oidc-gatekeeper

There are some references to prometheus packages in go.mod and .sum files but nothing is implemented in its code.

pvcviewer-operator

seldon-controller-manager

tensorboard-controller

tensorboard-web-app

Code doesn't implement any metrics.

training-operator

orfeas-k commented 5 months ago

Upstream apps that do not already expose metrics (wip)

  1. admission-webhook
  2. jupyter-ui (jupyter-web-app)
  3. katib-db-manager
  4. katib-ui
  5. kfp-metadata-writer
  6. kfp-persistence
  7. kfp-profile-controller
  8. kfp-schedwf
  9. kfp-ui
  10. kfp-viewer
  11. kfp-viz
  12. kubeflow-dashboard (explanation in previous comment)
  13. kubeflow-roles. This isn't an upstream app but we 'd still need an exporter if we 'd like metrics from this charm.
  14. kubeflow-volumes (volumes-web-app)
  15. mlmd
  16. oidc-authservice
  17. tensorboards-web-app
kimwnasptd commented 5 months ago

Regarding all the K8s Controllers from kubeflow/kubeflow (notebooks, profiles, tensorboards) they will get some quite useful metrics by default because of controller-runtime golang package, that comes with Kubebuilder https://book.kubebuilder.io/reference/metrics-reference

Those are perfect for capturing if the controllers are working as expected, and it's great it will be handled by default.

In order for this to happen though, someone upstream will need to bump the controller-runtime package from 0.11 to 0.16.3

orfeas-k commented 5 months ago

Related sheet