canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Mysql charm in unknown/idle state #457

Closed dparv closed 1 month ago

dparv commented 2 months ago

Steps to reproduce

  1. juju deploy mysql-k8s --channel 8.0/stable

Expected behavior

charm in active/idle state

Actual behavior

charm in unknown idle/state

Versions

K8s: AKS 1.28.10

Juju CLI: 3.4.4

Juju agent: 3.4.4

Charm revision: 153

Log output

Juju debug log:

Kubectl debug log:

  Warning  Unhealthy               14m (x14 over 15m)  kubelet                  Readiness probe failed: HTTP probe failed with statuscode: 502

Full traces: https://pastebin.canonical.com/p/R9Bf347V6S/

Additional context

kfp-db                     8.0.36-0ubuntu0.22.04.1  waiting      1  mysql-k8s                8.0/stable       153  10.0.169.45   no       waiting for units to settle down
kfp-db/0*                     unknown   idle   10.244.2.29                 
github-actions[bot] commented 2 months ago

https://warthogs.atlassian.net/browse/DPE-4850

paulomach commented 2 months ago

@dparv , in the STR you are not using --trust. Can you confirm if you're trusting the app?

dparv commented 2 months ago

Yes, all apps are deployed using trust, this fails on random occasions, it's not always failing.

paulomach commented 2 months ago

Yes, all apps are deployed using trust, this fails on random occasions, it's not always failing.

Alright. I'm trying on AKS then

paulomach commented 2 months ago

Yes, all apps are deployed using trust, this fails on random occasions, it's not always failing.

Alright. I'm trying on AKS then

btw, did you ever saw that other than on AKS?

dparv commented 2 months ago

only testing kubeflow deployment extensively on AKS, so that's where I've hit it, don't know if it's happening on other k8s clusters

paulomach commented 2 months ago

The root cause is that the *-pebble-ready handler on mysql_provider fails on a httpx error - connection refused to k8s api, failing the event, which refrains juju from committing databag keys from other(s) handler(s) observing the same event. This will render a state mismatch between the charm and the workload, preventing any further self maintenance to take place.

taurus-forever commented 1 month ago

The root cause is that the *-pebble-ready handler on mysql_provider fails on a httpx error - connection refused to k8s api, failing the event, which refrains juju from committing databag keys from other(s) handler(s) observing the same event. This will render a state mismatch between the charm and the workload, preventing any further self maintenance to take place.

The funny part, I didn't notice this issue on preparing https://charmhub.io/mysql-k8s/docs/h-deploy-aks Strange...

paulomach commented 1 month ago

The root cause is that the *-pebble-ready handler on mysql_provider fails on a httpx error - connection refused to k8s api, failing the event, which refrains juju from committing databag keys from other(s) handler(s) observing the same event. This will render a state mismatch between the charm and the workload, preventing any further self maintenance to take place.

The funny part, I didn't notice this issue on preparing https://charmhub.io/mysql-k8s/docs/h-deploy-aks Strange...

It's not something that shows up very often. @dparv was kind enough to give me access to an environment where it happened. curling the the k8s endpoint and manually triggering the hook after the error did not render the error, hence the conclusion of some transient issue.