charm went `offline` and has network connection errors

orfeas-k commented 11 months ago

I deployed mysql-k8s to EKS from 8.0/edge on 21 of November as part of Charmed Kubeflow bundle and the charm went into Maintenance with message offline and stayed there for a while. Eventually, it went to active by itself again but I took a look at the logs and saw a bunch of "log-sender" manifold worker returned unexpected error errors there.

Before that happens, I had scaled down the cluster (during the night) and scaled it up again.

Steps to reproduce

Unfortunately, I haven't found a way to reproduce this.

Expected behavior

Stay active and being able to respond to requests.

Actual behavior

I think that as a result of the above, one of our charms fails to contact mysql-k8s in that cluster with the following error

Ping to Katib db failed: dial tcp 10.100.51.25:3306: connect: connection refused

Versions

Operating system: Ubuntu 22.04

Juju CLI: 3.1/stable

Juju agent: unknown

Charm revision: deployed 8.0/edge on 21st of November

EKS: 1.25

Log output

Logs are from after I scaled up the cluster. juju debug-log.txt k8s logs.txt

Additional context

A user of CKF reported similar logs with revision 99 on juju 2.9 and microk8s 1.24. They had disabled and enabled their microk8s too and I think these logs are after re-enabling microk8s. db-logs.txt

P.S. Feel free to rename this issue, was not sure what the title should be.

github-actions[bot] commented 11 months ago

https://warthogs.atlassian.net/browse/DPE-3087

paulomach commented 11 months ago

Hi @orfeas-k , were you scaling back up from 0 units?

orfeas-k commented 11 months ago

yes @paulomach AFAICT. I scaled EKS cluster down to 0 nodes and then back to two.

paulomach commented 11 months ago

@orfeas-k that's probably it. We do have known issues when scaling from zero nodes, and a solution is under discussion and unfortunately will not come quickly.

canonical / mysql-k8s-operator