benbjohnson / litestream

Streaming replication for SQLite.
https://litestream.io
Apache License 2.0
11.1k stars 256 forks source link

`plndr-cp-lock` failure with kube-vip pod when backing up K3s sqlite database #572

Open yebo29 opened 8 months ago

yebo29 commented 8 months ago

I'm trying to use Litestream to backup the db of my 3-node k3s cluster running on Raspberry Pi 4s. It's a single primary node cluster with two worker nodes. I've written Ansible code to deploy the litestream systemd service to the primary node and backup to Backblaze. I can confirm that the service runs, connects to my bucket and starts replicating to it. However, I notice that certain pods start restarting when it runs. I've tried different --sync-intervals with no success. Specifically, the most detrimental of these is the kube-vip pod that dies with the following errors:

E0228 20:52:15.433056       1 leaderelection.go:369] Failed to update lock: Put "https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock?timeout=10s": context deadline exceeded
I0228 20:52:15.434181       1 leaderelection.go:285] failed to renew lease kube-system/plndr-cp-lock: timed out waiting for the condition
error: http2: client connection lost

During the pod restart process, I lose access to the VIP and API. I do not see any errors in the logs for litestream when I run journalctl -xf -u litestream, just replication logs. However, upon writing this I realize that by default, log level is set to INFO. Perhaps I can temporarily change that and observe. But in the meantime, I want to see if anyone else in the community has seen this, and if there are any solutions I can try.

I'm running the latest tag of kube-vip, v0.7.1

Below is my current (redacted) config:

access-key-id: [redacted]
secret-access-key: [redacted]

dbs:
  - path: /var/lib/rancher/k3s/server/db/state.db
    replicas:
      - type: s3
        bucket: mybucket
        endpoint: s3.us-xxxx-xxx.backblazeb2.com
        path: litestream
        force-path-style: true
        sync-interval: 30s