cybozu-go / moco

MySQL operator on Kubernetes using GTID-based semi-synchronous replication.
https://cybozu-go.github.io/moco/
Apache License 2.0
269 stars 22 forks source link

Error: leader election lost #680

Closed lanss315425 closed 3 months ago

lanss315425 commented 3 months ago

I encountered this issue during the testing process. Here are my steps to proceed

curl -fsLO https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml
kubectl apply -f cert-manager.yaml

curl -fsLO https://github.com/cybozu-go/moco/releases/download/v0.20.2/moco.yaml
kubectl apply -f moco.yaml

After running for 1 day, I found that the pod restarted. Here is the restart log

{"level":"info","ts":"2024-05-24T06:51:05Z","msg":"Starting workers","controller":"mysqlcluster","controllerGroup":"moco.cybozu.com","controllerKind":"MySQLCluster","worker count":8}
{"level":"info","ts":"2024-05-24T07:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T08:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T09:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T10:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T11:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T12:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T13:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T14:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T15:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T16:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T17:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T18:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T19:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
{"level":"info","ts":"2024-05-24T20:51:04Z","logger":"agent-client","msg":"certificate reloaded"}
E0524 21:19:58.830509       1 leaderelection.go:327] error retrieving resource lock moco-system/moco: Get "https://10.233.0.1:443/apis/coordination.k8s.io/v1/namespaces/moco-system/leases/moco": context deadline exceeded
I0524 21:20:58.630679       1 leaderelection.go:280] failed to renew lease moco-system/moco: timed out waiting for the condition
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"shutting down server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"mysqlcluster","controllerGroup":"moco.cybozu.com","controllerKind":"MySQLCluster"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-05-24T21:20:58Z","logger":"controller-runtime.webhook","msg":"Shutting down webhook server with timeout of 1 minute"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"All workers finished","controller":"mysqlcluster","controllerGroup":"moco.cybozu.com","controllerKind":"MySQLCluster"}
{"level":"info","ts":"2024-05-24T21:20:58Z","msg":"All workers finished","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"error","ts":"2024-05-24T21:20:58Z","logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"github.com/cybozu-go/moco/cmd/moco-controller/cmd.subMain\n\t/work/cmd/moco-controller/cmd/run.go:150\ngithub.com/cybozu-go/moco/cmd/moco-controller/cmd.glob..func1\n\t/work/cmd/moco-controller/cmd/root.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/cybozu-go/moco/cmd/moco-controller/cmd.Execute\n\t/work/cmd/moco-controller/cmd/root.go:89\nmain.main\n\t/work/cmd/moco-controller/main.go:6\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
leader election lost
Error: leader election lost

k8s version: 1.26.13

ymmt2005 commented 3 months ago

The log does not seem to be a bug. It just says the moco-controller process failed to keep the lease due to, probably, a lost connection with kube-apiserver somehow. Please check your control plane status when it happened.

lanss315425 commented 3 months ago

thanks