cybozu-go / moco

MySQL operator on Kubernetes using GTID-based semi-synchronous replication.
https://cybozu-go.github.io/moco/
Apache License 2.0
273 stars 22 forks source link

mysqld container not ready: failed to show slave status: sql: no rows in result set #433

Closed mplewis closed 2 years ago

mplewis commented 2 years ago

Describe the bug I am having trouble starting a MySQLCluster (v1beta2) with 1 replica. The mysqld container does not become ready.

The error in the agent container reads failed to get replica status: failed to show slave status: sql: no rows in result set.

Environments

To Reproduce

Deploy the following moco.v1beta2.MySQLCluster to a DigitalOcean Managed Kubernetes cluster:

{
  metadata: { name: "mycluster" },
  spec: {
    replicas: 1,
    backupPolicyName: "daily",
    podTemplate: {
      spec: {
        containers: [
          {
            name: "mysqld",
            image: "quay.io/cybozu/mysql:8.0.28",
            resources: {
              requests: { cpu: "100m", memory: "128Mi" },
              limits: { cpu: "1", memory: "1Gi" },
            },
          },
        ],
        securityContext: {
          fsGroup: 10000,
          fsGroupChangePolicy: "OnRootMismatch",
        },
      },
    },
    volumeClaimTemplates: [
      {
        metadata: { name: "mysql-data" },
        spec: {
          accessModes: ["ReadWriteOnce"],
          resources: { requests: { storage: "10Gi" } },
        },
      },
    ],
  },
}

The mysqld container will fail to become ready. The agent container will print the following errors in its logs:

{"level":"error","ts":1658519430.212341,"logger":"init","caller":"cmd/root.go:259","msg":"connecting mysqld failed","error":"dial unix /run/mysqld.sock: connect: no such file or directory"}
{"level":"info","ts":1658519433.4968832,"logger":"cert-reloader","caller":"cert/cert.go:74","msg":"certificate reloaded"}
{"level":"info","ts":1658519433.4977221,"logger":"cron","caller":"v3@v3.0.1/cron.go:240","msg":"start"}
{"level":"info","ts":1658519433.4979293,"logger":"cron","caller":"v3@v3.0.1/cron.go:246","msg":"schedule","now":1658519433.4978883,"entry":1,"next":1658519700}
2022-07-22T19:50:41.120333Z moco-ghost-kesdev-0 moco-agent info: "well: access" http_host="10.244.4.87:9081" http_method="GET" http_status_code=200 http_user_agent="kube-probe/1.21" protocol="HTTP/1.1" remote_ipaddr="10.244.4.57" request_id="184ebfb1-175e-1244-b21a-d3711557b955" request_size=0 response_size=0 response_time=0.000312582 type="access" url="/healthz"
2022-07-22T19:50:51.115730Z moco-ghost-kesdev-0 moco-agent info: "well: access" http_host="10.244.4.87:9081" http_method="GET" http_status_code=200 http_user_agent="kube-probe/1.21" protocol="HTTP/1.1" remote_ipaddr="10.244.4.57" request_id="1b4ebfb1-175e-1244-b21a-d3711557b955" request_size=0 response_size=0 response_time=0.000363289 type="access" url="/healthz"
{"level":"error","ts":1658519451.126881,"logger":"agent","caller":"server/mysqld_health.go:56","msg":"failed to get replica status","error":"failed to show slave status: sql: no rows in result set"}
2022-07-22T19:50:51.127135Z moco-ghost-kesdev-0 moco-agent error: "well: access" http_host="10.244.4.87:9081" http_method="GET" http_status_code=500 http_user_agent="kube-probe/1.21" protocol="HTTP/1.1" remote_ipaddr="10.244.4.57" request_id="1a4ebfb1-175e-1244-b21a-d3711557b955" request_size=0 response_size=86 response_time=0.007131104 type="access" url="/readyz"
// ...errors repeat...

Expected behavior mysqld container passes readiness check eventually.

Additional context I think that this happens in main/pkg/dbop/status.go inside getReplicaStatus. Perhaps the Go error instance is different even though the message is identical, and this causes an error to be thrown unintentionally.

masa213f commented 2 years ago

@mplewis Thank you for the report!

Is there the BackupPolicy resource named daily? If no BackupPolicy resources exist in the namespace, please remove the backupPolicyName field from your MySQLCluster manifests.

In my guess, the moco-controller may output the following error.

failed to get backup policy default/daily: BackupPolicy.moco.cybozu.com \"daily\" not found

If so, the manager process has not been started and the cluster cannot work.

mplewis commented 2 years ago

Sorry for the long delay on the response. I tried your suggestion and found that my cluster was misconfigured. My backup policy had gone missing. Removing the field fixed this.

Thank you for the help!