canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

MySQL server in read-only mode after pod is terminated and restarted #115

Closed jadonn closed 1 year ago

jadonn commented 1 year ago

Bug Description

For a single-unit deployment of MySQL, the MySQL unit reports green active status but the MySQL server is in read-only mode when attempting writes to MySQL.

I deployed the MySQL using the mysql-k8s-operator as part of the deployment of a single-node OpenStack deployment on top of Microk8s. Juju reported the deployment as completing successfully, and I was able to complete some starting configuration tasks that would require writes to the MySQL database for the OpenStack cluster.

However, when I tried to interact with OpenStack maybe 30 minutes later I found that OpenStack was reporting the database was in read-only mode and throwing an exception. OpenStack was not functional because of this issue.

After some investigation, I found that the MySQL pod had been terminated with error 137 and recreated. I did not see any OOM kills on my host server, however.

My OpenStack Canonical team mates asked me to share my experience here alongside their own similar experiences with MySQL entering read-only mode after a pod restart.

I have included my logs and information from Juju and Kubernetes. I don't have the error logs from OpenStack anymore unfortunately.

To Reproduce

Follow the steps listed in the guide at https://discourse.ubuntu.com/t/install-and-configure-microstack-sunbeam-single-node/32157.

I installed the Microstack snap from the sunbeam/edge/jamespage channel instead of the sunbeam/beta channel as recommended in that guide.

Environment

I'm running Juju 3.0/stable.

Microk8s version 1.25.3; revision 4127; tracking 1.25-dtrict/stable

Microstack version xena; revision 303; tracking sunbeam/edge/jamespage

Processor: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz RAM: 8GB

Relevant log output

--- MySQL active status in Juju but MySQL server is read-only ---

juju status --model openstack
Model      Controller          Cloud/Region        Version  SLA          Timestamp
openstack  microk8s-localhost  microk8s/localhost  3.0.2    unsupported  19:47:48Z

App          Version                  Status   Scale  Charm            Channel     Rev  Address         Exposed  Message
glance                                active       1  glance-k8s       xena/beta    17  10.152.183.177  no
horizon                               active       1  horizon-k8s      xena/beta    16  10.152.183.153  no       http://10.20.20.2:80/openstack-horizon
keystone                              active       1  keystone-k8s     xena/beta    70  10.152.183.219  no
mysql        8.0.31-0ubuntu0.22.04.1  active       1  mysql-k8s        edge         22  10.152.183.140  no
neutron                               waiting      1  neutron-k8s      xena/beta    19  10.152.183.151  no       installing agent
nova                                  active       1  nova-k8s         xena/beta    13  10.152.183.19   no
ovn-central                           active       1  ovn-central-k8s  21.09/beta   20  10.152.183.121  no
ovn-relay                             active       1  ovn-relay-k8s    21.09/beta   13  10.20.20.0      no
placement                             active       1  placement-k8s    xena/beta    11  10.152.183.128  no
rabbitmq     3.9.24                   active       1  rabbitmq-k8s     3.9/beta      6  10.20.20.1      no
traefik                               active       1  traefik-k8s      1.0/stable   93  10.20.20.2      no
vault                                 active       1  icey-vault-k8s   stable        9  10.152.183.96   no

Unit            Workload  Agent  Address      Ports  Message
glance/0*       active    idle   10.1.50.163
horizon/0*      active    idle   10.1.50.156
keystone/0*     active    idle   10.1.50.162
mysql/0*        active    idle   10.1.50.160
neutron/0*      blocked   idle   10.1.50.164         (container:neutron-server) healthcheck failed: online
nova/0*         active    idle   10.1.50.159
ovn-central/0*  active    idle   10.1.50.167
ovn-relay/0*    active    idle   10.1.50.165
placement/0*    active    idle   10.1.50.161
rabbitmq/0*     active    idle   10.1.50.172
traefik/0*      active    idle   10.1.50.158
vault/0*        active    idle   10.1.50.174

--- End of Juju status ---

--- Kubernetes Pod information for MySQL Pod ---
microk8s.kubectl get pod mysql-0 -n openstack -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 4fa49c22a0dceec9e18ea44c989d0386ddda7e1457c48a95549a889c4771776e
    cni.projectcalico.org/podIP: 10.1.50.160/32
    cni.projectcalico.org/podIPs: 10.1.50.160/32
    controller.juju.is/id: a1812785-5040-4e34-8710-2fa98a6605f4
    juju.is/version: 3.0.2
    model.juju.is/id: 58cbbdd8-cf45-4e50-8173-24626303377f
    unit.juju.is/id: mysql/0
  creationTimestamp: "2022-11-15T16:58:08Z"
  generateName: mysql-
  labels:
    app.kubernetes.io/name: mysql
    controller-revision-hash: mysql-5485bf654d
    statefulset.kubernetes.io/pod-name: mysql-0
  name: mysql-0
  namespace: openstack
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: mysql
    uid: 905ac0d6-4754-42a7-8a22-4e2166bc40bc
  resourceVersion: "16982"
  uid: 52d6b4c4-7995-4933-9003-c12206823f88
spec:
  automountServiceAccountToken: true
  containers:
  - args:
    - run
    - --http
    - :38812
    - --verbose
    command:
    - /charm/bin/pebble
    env:
    - name: JUJU_CONTAINER_NAMES
      value: mysql
    - name: HTTP_PROBE_PORT
      value: "3856"
    image: jujusolutions/charm-base:ubuntu-20.04
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/health?level=alive
        port: 38812
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: charm
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/health?level=ready
        port: 38812
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      runAsGroup: 0
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/pebble/default
      name: charm-data
      subPath: containeragent/pebble
    - mountPath: /charm/bin
      name: charm-data
      readOnly: true
      subPath: charm/bin
    - mountPath: /var/lib/juju
      name: charm-data
      subPath: var/lib/juju
    - mountPath: /charm/containers
      name: charm-data
      subPath: charm/containers
    - mountPath: /var/lib/juju/storage/database/0
      name: mysql-database-e8bdf524
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jzxq8
      readOnly: true
    workingDir: /var/lib/juju
  - args:
    - run
    - --create-dirs
    - --hold
    - --http
    - :38813
    - --verbose
    command:
    - /charm/bin/pebble
    env:
    - name: JUJU_CONTAINER_NAME
      value: mysql
    - name: PEBBLE_SOCKET
      value: /charm/container/pebble.socket
    image: registry.jujucharms.com/charm/62ptdfbrjpw3n9tcnswjpart30jauc6wc5wbi/mysql-image@sha256:296093ea20f96a0d5570ca8edb651242ba1ad6deece4f65055e0d4a33fec2476
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/health?level=alive
        port: 38813
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: mysql
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1/health?level=ready
        port: 38813
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      runAsGroup: 0
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /charm/bin/pebble
      name: charm-data
      readOnly: true
      subPath: charm/bin/pebble
    - mountPath: /charm/container
      name: charm-data
      subPath: charm/containers/mysql
    - mountPath: /var/lib/mysql
      name: mysql-database-e8bdf524
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jzxq8
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: mysql-0
  imagePullSecrets:
  - name: mysql-mysql-secret
  initContainers:
  - args:
    - init
    - --containeragent-pebble-dir
    - /containeragent/pebble
    - --charm-modified-version
    - "0"
    - --data-dir
    - /var/lib/juju
    - --bin-dir
    - /charm/bin
    command:
    - /opt/containeragent
    env:
    - name: JUJU_CONTAINER_NAMES
      value: mysql
    - name: JUJU_K8S_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: JUJU_K8S_POD_UUID
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.uid
    envFrom:
    - secretRef:
        name: mysql-application-config
    image: jujusolutions/jujud-operator:3.0.2
    imagePullPolicy: IfNotPresent
    name: charm-init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/juju
      name: charm-data
      subPath: var/lib/juju
    - mountPath: /containeragent/pebble
      name: charm-data
      subPath: containeragent/pebble
    - mountPath: /charm/bin
      name: charm-data
      subPath: charm/bin
    - mountPath: /charm/containers
      name: charm-data
      subPath: charm/containers
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jzxq8
      readOnly: true
    workingDir: /var/lib/juju
  nodeName: thinkpad-server
  nodeSelector:
    kubernetes.io/arch: amd64
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: mysql
  serviceAccountName: mysql
  subdomain: mysql-endpoints
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: mysql-database-e8bdf524
    persistentVolumeClaim:
      claimName: mysql-database-e8bdf524-mysql-0
  - emptyDir: {}
    name: charm-data
  - name: kube-api-access-jzxq8
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-11-15T18:09:41Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-11-15T18:11:58Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-11-15T18:11:58Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-11-15T16:58:14Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://d70683d7f4120cf541172a8d6e33ff1d6de7fbce3f8818e88c18275f171eb0fa
    image: docker.io/jujusolutions/charm-base:ubuntu-20.04
    imageID: docker.io/jujusolutions/charm-base@sha256:5ccefd1a92d63baa961680c22a47e01213c99e9c06280c732a1910a5c126f2d2
    lastState:
      terminated:
        containerID: containerd://bac051694bcd64698aad1089605c0d01ad22b860de7081f2b7ccdc7c59e11625
        exitCode: 0
        finishedAt: "2022-11-15T18:10:58Z"
        reason: Completed
        startedAt: "2022-11-15T18:10:28Z"
    name: charm
    ready: true
    restartCount: 3
    started: true
    state:
      running:
        startedAt: "2022-11-15T18:11:27Z"
  - containerID: containerd://e239fb1e905266628e99488a5979a8b6f0a8e660425d27f927317dbc13a96a01
    image: sha256:c2bb49b391d329a07c496ef4d735a62367dbabf34dc531117d9f3152cdfe748d
    imageID: registry.jujucharms.com/charm/62ptdfbrjpw3n9tcnswjpart30jauc6wc5wbi/mysql-image@sha256:296093ea20f96a0d5570ca8edb651242ba1ad6deece4f65055e0d4a33fec2476
    lastState:
      terminated:
        containerID: containerd://ce4ac97cd3cbb3f6b1e3bf2864355995a59b2fcf3ae250c55e0209c9ec8128c0
        exitCode: 137
        finishedAt: "2022-11-15T18:09:16Z"
        reason: Error
        startedAt: "2022-11-15T17:05:47Z"
    name: mysql
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2022-11-15T18:09:41Z"
  hostIP: 192.168.1.19
  initContainerStatuses:
  - containerID: containerd://f7c798559d2ffeab80c69adb86723035227b30317edfba5fbf068f8f5d49ed69
    image: docker.io/jujusolutions/jujud-operator:3.0.2
    imageID: docker.io/jujusolutions/jujud-operator@sha256:bd700725ede36bd8958ee5c6be5666ad1898ef740fa7b5a4ccbc910bf2bba080
    lastState: {}
    name: charm-init
    ready: true
    restartCount: 1
    state:
      terminated:
        containerID: containerd://f7c798559d2ffeab80c69adb86723035227b30317edfba5fbf068f8f5d49ed69
        exitCode: 0
        finishedAt: "2022-11-15T18:09:40Z"
        reason: Completed
        startedAt: "2022-11-15T18:09:39Z"
  phase: Running
  podIP: 10.1.50.160
  podIPs:
  - ip: 10.1.50.160
  qosClass: BestEffort
  startTime: "2022-11-15T16:58:14Z"

--- End Kubernetes Pod information ---

--- Juju debug logs replay error messages ---
verbose: 2022-11-15T18:16:26Z: Loading startup files...
verbose: 2022-11-15T18:16:26Z: Loading plugins...
verbose: 2022-11-15T18:16:26Z: Connecting to MySQL at: clusteradmin@mysql-0.mysql-endpoints
verbose: 2022-11-15T18:16:26Z: Shell.connect: tid=101: CONNECTED: mysql-0.mysql-endpoints
verbose: 2022-11-15T18:16:26Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000
verbose: 2022-11-15T18:16:26Z: Dba.get_cluster: tid=102: CONNECTED: mysql-0.mysql-endpoints:3306
Traceback (most recent call last):
  File "<string>", line 2, in <module>
mysqlsh.Error: Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)

unit-mysql-0: 18:16:27 ERROR unit.mysql/0.juju-log Failed to get cluster status for cluster_495e7c6e48a2935912c58ec405a5110b
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_k8s_helpers.py", line 523, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-mysql-0/charm/venv/ops/pebble.py", line 1307, in wait_output
    raise ExecError(self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2022-11-15T18:16:26Z: Loading startup files...\nverbose: 2022-11-15T18:16:26Z: Loading plugins...\nverbose: 2022-11-15T18:16:26Z: Connecting to MySQL at: clusteradmin@mysql-0.mysql-endpoints\nverbose: 2022-11-15T18:16:26Z: Shell.connect: tid=103: CONNECTED: mysql-0.mysql-endpoints\nverbose: 2022-11-15T18:16:26Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000\nverbose: 2022-11-15T18:16:26Z: Dba.get_cluster: tid=104: CONNECTED: mysql-0.mysql-endpoints:3306\nTraceback (most recent call last):\n  File "<string>", line 2, in <module>\nmysqlsh.Error: Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 703, in get_cluster_status
    output = self._run_mysqlsh_script("\n".join(status_commands))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_k8s_helpers.py", line 526, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory

--- End Juju debug log replay ---

Additional context

Similar issue to the following existing issues: https://github.com/canonical/mysql-k8s-operator/issues/107 https://github.com/canonical/mysql-k8s-operator/issues/96

carlcsaposs-canonical commented 1 year ago

Possible duplicate of #96, appears to be fixed by #123 and #128. Please re-open if the same issue occurs on mysql-k8s charmhub edge channel