Initial sync of large LVM volumes

maxpain commented 1 month ago

Hello. When I create a 1TB Persistent Volume backed by DRBD + LVM (thick), it takes about 2 hours to sync it to another replica. Is it expected behavior? I thought if this is a new empty volume, we don't need to sync it, right?

My configuration:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nvme-lvm-replicated-async
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: linstor.csi.linbit.com
parameters:
  linstor.csi.linbit.com/storagePool: nvme-lvm
  linstor.csi.linbit.com/autoPlace: "2"
  linstor.csi.linbit.com/layerList: "drbd storage"
  linstor.csi.linbit.com/allowRemoteVolumeAccess: "false"
  property.linstor.csi.linbit.com/DrbdOptions/Net/protocol: "A"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: talos-loader-override
spec:
  podTemplate:
    spec:
      hostNetwork: true
      initContainers:
        - name: drbd-shutdown-guard
          $patch: delete
        - name: drbd-module-loader
          $patch: delete
      volumes:
        - name: run-systemd-system
          $patch: delete
        - name: run-drbd-shutdown-guard
          $patch: delete
        - name: systemd-bus-socket
          $patch: delete
        - name: lib-modules
          $patch: delete
        - name: usr-src
          $patch: delete
        - name: etc-lvm-backup
          hostPath:
            path: /var/etc/lvm/backup
            type: DirectoryOrCreate
        - name: etc-lvm-archive
          hostPath:
            path: /var/etc/lvm/archive
            type: DirectoryOrCreate

  storagePools:
    - name: nvme-lvm
      lvmPool: {}
      source:
        hostDevices:
          - /dev/nvme0n1
          - /dev/nvme1n1
      properties:
        - name: StorDriver/LvcreateOptions
          value: "-i 2"

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  properties:
    - name: DrbdOptions/Disk/disk-flushes
      value: "no"
    - name: DrbdOptions/Disk/md-flushes
      value: "no"
    - name: DrbdOptions/Net/max-buffers
      value: "80000"
    - name: DrbdOptions/Net/rcvbuf-size
      value: "10485760"
    - name: DrbdOptions/Net/sndbuf-size
      value: "10485760"
    - name: DrbdOptions/PeerDevice/c-fill-target
      value: "1024"
    - name: DrbdOptions/PeerDevice/c-max-rate
      value: "0"
    - name: DrbdOptions/PeerDevice/c-min-rate
      value: "1000000"
    - name: DrbdOptions/PeerDevice/resync-rate
      value: "5000000"
    - name: DrbdOptions/PeerDevice/c-plan-ahead
      value: "0"
    - name: DrbdOptions/auto-quorum
      value: "suspend-io"
    - name: DrbdOptions/Resource/on-no-data-accessible
      value: "suspend-io"
    - name: DrbdOptions/Resource/on-suspended-primary-outdated
      value: "force-secondary"
    - name: DrbdOptions/Net/rr-conflict
      value: "retry-connect"

rck commented 1 month ago

It is expected, for thick LVM you can not assume that the copies are identical/read 0s, you get random old data, that is why a initial sync is necessary. LVM thin and both variants of ZFS give you 0s and there the initial sync can be skipped.

rp- commented 1 month ago

But default resync settings are very conservative so this may help to speed things up: https://kb.linbit.com/tuning-drbds-resync-controller

maxpain commented 1 month ago

@rck Are there ways to wipe (fill with 0s) the block device on both sides instead of sending this garbage over the wire? It should be much faster.

rck commented 1 month ago

as usual, "it depends". If you have local 0s, then this is rather efficient, then not every single 0 is sent, this is detected by DRBD and information is sent that there are that many 0s. and then there is trim and discard and what not.

with DRBD alone one could zero-out the local devices and intentionally skip initial sync, one might even have a use case where one is fine having different blocks in "uninteresting" areas. Unfortunately I don't know if LINSTOR has any properties to skip the initial sync on LVM thick or some strategies to locally wipe a LVM thick device and then skip. @rp- is there anything in that area?

rp- commented 1 month ago

No, there is currently no way to skip the initial LVM-thick sync

LINBIT / linstor-server

Initial sync of large LVM volumes #424