canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Mysql-k8s charm do not work when connected to COS [hook failed: "metrics-endpoint-relation-created"] #504

Closed Barteus closed 1 month ago

Barteus commented 1 month ago

Steps to reproduce

  1. Deploy microk8s
  2. Deploy COS (on the same cluster for simplicity)
  3. juju deploy -m kubeflow --debug ./ckf/bundle.yaml \ --overlay ./ckf/authentication-overlay.yaml \ --overlay ./ckf/cos-integration.yaml \ --overlay ./ckf/mlflow-integration.yaml \ --trust Bundles available here: https://github.com/Barteus/demo-aws-mk8s-ckf-mlflow/tree/dbb19cf294b42ad63d2ed1442a046a7b077f062f/ckf

Expected behavior

Deployment of Kubeflow works and Mysql-k8s charm provide the

Actual behavior

$ juju status
Model     Controller      Cloud/Region    Version  SLA          Timestamp
kubeflow  aws-controller  mk8s/localhost  3.5.3    unsupported  09:28:19Z

SAAS                             Status  Store           URL
grafana-dashboards               active  aws-controller  admin/cos.grafana-dashboards
loki-logging                     active  aws-controller  admin/cos.loki-logging
prometheus-receive-remote-write  active  aws-controller  admin/cos.prometheus-receive-remote-write
prometheus-scrape                active  aws-controller  admin/cos.prometheus-scrape

App                      Version                Status   Scale  Charm                    Channel          Rev  Address         Exposed  Message
admission-webhook                               active       1  admission-webhook        1.9/stable       344  10.152.183.76   no       
argo-controller                                 active       1  argo-controller          3.4/stable       545  10.152.183.42   no       
dex-auth                                        active       1  dex-auth                 2.39/stable      548  10.152.183.158  no       
envoy                                           active       1  envoy                    2.2/stable       263  10.152.183.135  no       
grafana-agent-k8s        0.32.1                 active       1  grafana-agent-k8s        latest/stable     45  10.152.183.46   no       
istio-ingressgateway                            active       1  istio-gateway            1.22/stable     1218  10.152.183.252  no       
istio-pilot                                     active       1  istio-pilot              1.22/stable     1169  10.152.183.62   no       
jupyter-controller                              active       1  jupyter-controller       1.9/stable      1038  10.152.183.97   no       
jupyter-ui                                      active       1  jupyter-ui               1.9/stable       961  10.152.183.183  no       
katib-controller                                active       1  katib-controller         0.17/stable      750  10.152.183.26   no       
katib-db                                        error        1  mysql-k8s                8.0/stable       180  10.152.183.232  no       hook failed: "metrics-endpoint-relation-created"
katib-db-manager                                waiting      1  katib-db-manager         0.17/stable      713  10.152.183.95   no       installing agent
katib-ui                                        active       1  katib-ui                 0.17/stable      713  10.152.183.211  no       
kfp-api                                         waiting      1  kfp-api                  2.2/stable      1552  10.152.183.53   no       installing agent
kfp-db                                          error        1  mysql-k8s                8.0/stable       180  10.152.183.193  no       hook failed: "metrics-endpoint-relation-created"
kfp-metadata-writer                             active       1  kfp-metadata-writer      2.2/stable       617  10.152.183.153  no       
kfp-persistence                                 blocked      1  kfp-persistence          2.2/stable      1560  10.152.183.140  no       [relation:kfp-api] Expected data from exactly 1 related applications - got 0.
kfp-profile-controller                          active       1  kfp-profile-controller   2.2/stable      1518  10.152.183.165  no       
kfp-schedwf                                     active       1  kfp-schedwf              2.2/stable      1571  10.152.183.77   no       
kfp-ui                                          blocked      1  kfp-ui                   2.2/stable      1555  10.152.183.49   no       [relation:kfp-api] Expected data from exactly 1 related applications - got 0.
kfp-viewer                                      active       1  kfp-viewer               2.2/stable      1586  10.152.183.198  no       
kfp-viz                                         active       1  kfp-viz                  2.2/stable      1504  10.152.183.194  no       
knative-eventing                                active       1  knative-eventing         1.12/stable      459  10.152.183.139  no       
knative-operator                                active       1  knative-operator         1.12/stable      433  10.152.183.196  no       
knative-serving                                 active       1  knative-serving          1.12/stable      487  10.152.183.78   no       
kserve-controller                               active       1  kserve-controller        0.13/stable      626  10.152.183.247  no       
kubeflow-dashboard                              active       1  kubeflow-dashboard       1.9/stable       659  10.152.183.25   no       
kubeflow-profiles                               active       1  kubeflow-profiles        1.9/stable       419  10.152.183.131  no       
kubeflow-roles                                  active       1  kubeflow-roles           1.9/stable       240  10.152.183.152  no       
kubeflow-volumes                                active       1  kubeflow-volumes         1.9/stable       348  10.152.183.180  no       
metacontroller-operator                         active       1  metacontroller-operator  3.0/stable       311  10.152.183.186  no       
minio                    res:oci-image@5102166  active       1  minio                    ckf-1.9/stable   347  10.152.183.145  no       
mlflow-mysql                                    error        1  mysql-k8s                8.0/stable       180  10.152.183.113  no       hook failed: "metrics-endpoint-relation-created"
mlflow-server                                   waiting      1  mlflow-server            2.15/stable      638  10.152.183.173  no       installing agent
mlmd                                            active       1  mlmd                     ckf-1.9/stable   213  10.152.183.93   no       
oidc-gatekeeper                                 active       1  oidc-gatekeeper          ckf-1.9/stable   423  10.152.183.219  no       
pvcviewer-operator                              active       1  pvcviewer-operator       1.9/stable       157  10.152.183.231  no       
resource-dispatcher                             active       1  resource-dispatcher      2.0/stable       182  10.152.183.184  no       
tensorboard-controller                          active       1  tensorboard-controller   1.9/stable       355  10.152.183.66   no       
tensorboards-web-app                            active       1  tensorboards-web-app     1.9/stable       343  10.152.183.17   no       
training-operator                               active       1  training-operator        1.8/stable       503  10.152.183.56   no       

Unit                        Workload  Agent  Address       Ports          Message
admission-webhook/0*        active    idle   10.1.59.140                  
argo-controller/0*          active    idle   10.1.118.204                 
dex-auth/0*                 active    idle   10.1.57.203                  
envoy/0*                    active    idle   10.1.59.142                  
grafana-agent-k8s/0*        active    idle   10.1.118.206                 
istio-ingressgateway/0*     active    idle   10.1.59.141                  
istio-pilot/0*              active    idle   10.1.57.204                  
jupyter-controller/0*       active    idle   10.1.57.206                  
jupyter-ui/0*               active    idle   10.1.59.143                  
katib-controller/0*         active    idle   10.1.57.207                  
katib-db-manager/0*         waiting   idle   10.1.59.144                  Waiting for relational-db data
katib-db/0*                 error     idle   10.1.57.209                  hook failed: "metrics-endpoint-relation-created"
katib-ui/0*                 active    idle   10.1.59.145                  
kfp-api/0*                  waiting   idle   10.1.118.208                 Waiting for relational-db data
kfp-db/0*                   error     idle   10.1.118.210                 hook failed: "metrics-endpoint-relation-created"
kfp-metadata-writer/0*      active    idle   10.1.59.147                  
kfp-persistence/0*          blocked   idle   10.1.59.149                  [relation:kfp-api] Expected data from exactly 1 related applications - got 0.
kfp-profile-controller/0*   active    idle   10.1.57.210                  
kfp-schedwf/0*              active    idle   10.1.59.150                  
kfp-ui/0*                   blocked   idle   10.1.57.212                  [relation:kfp-api] Expected data from exactly 1 related applications - got 0.
kfp-viewer/0*               active    idle   10.1.59.151                  
kfp-viz/0*                  active    idle   10.1.118.211                 
knative-eventing/0*         active    idle   10.1.59.146                  
knative-operator/0*         active    idle   10.1.59.154                  
knative-serving/0*          active    idle   10.1.59.148                  
kserve-controller/0*        active    idle   10.1.57.216                  
kubeflow-dashboard/0*       active    idle   10.1.57.215                  
kubeflow-profiles/0*        active    idle   10.1.59.155                  
kubeflow-roles/0*           active    idle   10.1.118.212                 
kubeflow-volumes/0*         active    idle   10.1.59.153                  
metacontroller-operator/0*  active    idle   10.1.59.152                  
minio/0*                    active    idle   10.1.57.213   9000-9001/TCP  
mlflow-mysql/0*             error     idle   10.1.118.215                 hook failed: "metrics-endpoint-relation-created"
mlflow-server/0*            waiting   idle   10.1.59.161                  Waiting for relational-db relation data
mlmd/0*                     active    idle   10.1.59.160                  
oidc-gatekeeper/0*          active    idle   10.1.118.216                 
pvcviewer-operator/0*       active    idle   10.1.59.157                  
resource-dispatcher/0*      active    idle   10.1.57.217                  
tensorboard-controller/0*   active    idle   10.1.59.158                  
tensorboards-web-app/0*     active    idle   10.1.59.159                  
training-operator/0*        active    idle   10.1.118.214     

Versions

Operating system: Ubuntu 22.04.4 LTS

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: 180

microk8s: 1.28

Log output

Juju debug log: log.txt

Additional context

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5417.

This message was autogenerated

paulomach commented 1 month ago

@shayancanonical , for context, I think it's related to PR#483

shayancanonical commented 1 month ago

@paulomach agreed, my suspicion was that the metrics-endpoint-created hook runs before the leader-elected hook is run (thus the cluster-name is unavailable in the app peer databag). since the hook errors out, the queued leader-elected hook is never run

natalytvinova commented 1 month ago

Workaround I found: First deploy without the cos overlay:

juju deploy --overlay auth-overlay.yaml --overlay mlflow-overlay.yaml --overlay storage-overlay.yaml ./kubeflow.yaml --trust -m kubeflow

And when everything settles deploy the overlay:

juju deploy --overlay auth-overlay.yaml --overlay cos-integration-overlay.yaml --overlay mlflow-overlay.yaml --overlay storage-overlay.yaml ./kubeflow.yaml --trust -m kubeflow

And the units come up fine