LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

Drbd 9.2.6: Does protocol A or B still work? #78

Closed klsx0 closed 11 months ago

klsx0 commented 1 year ago

Hello,

I use Drbd 9.2.6 with drbdadm 9.26.0 to replicate 1 volume on remote host. In my case I am limited by a network with a bandwidth of 220Mb/s.

Using drbd with protocol C is too restrictive for me, as it will reduce the speed of the replicated disk and a database running PostgresQL on it. So I wanted to test performance by changing protocol B or A.

The results are exactly the same between the 3 protocols. To test disk performance, I used the commands in the following documentation

sudo dd if=/dev/zero of=/dev/drbd0 bs=4M count=100 oflag=direct
debian@dev-machine-1:~$ sudo dd if=/dev/zero of=/dev/drbd0 bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB, 400 MiB) copied, 16.9722 s, 24.7 MB/s

/etc/drbd.d/r1.res

resource r1 {
  volume 0 {
    device minor 0;
    disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2;
    meta-disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1;
  }
  on dev-machine-1  {
    node-id 0;
  }
  on dev-machine-2 {
    node-id 1;
  }
  connection {
    host "dev-machine-1" address 0.0.0.0:7789;
    host "dev-machine-2" address 192.168.1.151:7789;
    net {
      protocol A;
    }
  }
}
debian@dev-machine-1:~$ drbdsetup show r1 --show-defaults
resource "r1" {
    options {
        cpu-mask                ""; # default
        on-no-data-accessible   io-error; # default
        auto-promote            yes; # default
        peer-ack-window         4096s; # bytes, default
        peer-ack-delay          100; # milliseconds, default
        twopc-timeout           300; # 1/10 seconds, default
        twopc-retry-timeout     1; # 1/10 seconds, default
        auto-promote-timeout    20; # 1/10 seconds, default
        max-io-depth            8000; # default
        quorum                  off; # default
        on-no-quorum            suspend-io; # default
        quorum-minimum-redundancy       off; # default
        on-suspended-primary-outdated   disconnect; # default
    }
    _this_host {
        node-id                 0;
        volume 0 {
            device                      minor 0;
            disk                        "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2";
            meta-disk                   "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1";
            disk {
                size                    0s; # bytes, default
                on-io-error             detach; # default
                disk-barrier            no; # default
                disk-flushes            yes; # default
                disk-drain              yes; # default
                md-flushes              yes; # default
                resync-after            -1; # default
                al-extents              1237; # default
                al-updates              yes; # default
                discard-zeroes-if-aligned       yes; # default
                disable-write-same      no; # default
                disk-timeout            0; # 1/10 seconds, default
                read-balancing          prefer-local; # default
                rs-discard-granularity  0; # bytes, default
            }
        }
    }
    connection {
        _peer_node_id 1;
        path {
            _this_host ipv4 0.0.0.0:7789;
            _remote_host ipv4 192.168.1.151:7789;
        }
        net {
            transport           ""; # default
            load-balance-paths  no; # default
            protocol            A;
            timeout             60; # 1/10 seconds, default
            max-epoch-size      2048; # default
            connect-int         10; # seconds, default
            ping-int            10; # seconds, default
            sndbuf-size         0; # bytes, default
            rcvbuf-size         0; # bytes, default
            ko-count            7; # default
            allow-two-primaries no; # default
            cram-hmac-alg       ""; # default
            after-sb-0pri       disconnect; # default
            after-sb-1pri       disconnect; # default
            after-sb-2pri       disconnect; # default
            always-asbp         no; # default
            rr-conflict         disconnect; # default
            ping-timeout        5; # 1/10 seconds, default
            data-integrity-alg  ""; # default
            tcp-cork            yes; # default
            on-congestion       block; # default
            congestion-fill     0s; # bytes, default
            congestion-extents  1237; # default
            csums-alg           ""; # default
            csums-after-crash-only      no; # default
            verify-alg          ""; # default
            use-rle             yes; # default
            socket-check-timeout        0; # default
            fencing             dont-care; # default
            max-buffers         2048; # default
            allow-remote-read   yes; # default
            tls                 no; # default
            tls-keyring         (null); # default
            tls-privkey         (null); # default
            tls-certificate     (null); # default
            rdma-ctrl-rcvbuf-size       0; # default
            rdma-ctrl-sndbuf-size       0; # default
            _name               "dev-machine-2";
        }
        volume 0 {
            disk {
                resync-rate             250k; # bytes/second, default
                c-plan-ahead            20; # 1/10 seconds, default
                c-delay-target          10; # 1/10 seconds, default
                c-fill-target           100s; # bytes, default
                c-max-rate              102400k; # bytes/second, default
                c-min-rate              250k; # bytes/second, default
                bitmap                  yes; # default
            }
        }
    }
}

Of course, I tried to change the place of the

net {
protocol A;
}

but with no noticeable effect.

Of course the /etc/drbd.d/global_common.conf file is left by default.

# DRBD is the result of over a decade of development by LINBIT.
# In case you need professional services for DRBD or have
# feature requests visit http://www.linbit.com

global {
        usage-count yes;

        # Decide what kind of udev symlinks you want for "implicit" volumes
        # (those without explicit volume <vnr> {} block, implied vnr=0):
        # /dev/drbd/by-resource/<resource>/<vnr>   (explicit volumes)
        # /dev/drbd/by-resource/<resource>         (default for implict)
        udev-always-use-vnr; # treat implicit the same as explicit volumes

        # minor-count dialog-refresh disable-ip-verification
        # cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}

common {
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when choosing your poison.

                # IMPORTANT: most of the following scripts symlink to "notify.sh" which tries to send mail via "mail".
                # If you intend to use this notify.sh script make sure that "mail" is installed.
                #
                # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
                # quorum-lost "/usr/lib/drbd/notify-quorum-lost.sh root";
                # disconnected /bin/true;
        }

        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }

        options {
                # cpu-mask on-no-data-accessible

                # RECOMMENDED for three or more storage nodes with DRBD 9:
                # quorum majority;
                # on-no-quorum suspend-io | io-error;
        }

        disk {
                # size on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
        }

        net {
                # protocol timeout max-epoch-size max-buffers
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
        }
}

I also tried to change the version: 9.2.6 -> 2.2.5 -> 2.1.17 But without success. Here is my system information debian 11: Linux dev-machine-1 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux All drbd are compiled from tar.gz sources

Thank you very much for your time Sincerely, Kylian,

dvance commented 1 year ago

I believe that the dd is single-threaded, so depending on how fast your storage is you might be CPU-bound. Unlikely given the speeds you're seeing, but just something I wanted to mention.

What is the speed of the storage without DRBD? What is the speed of the network between the host?

My best guess here is that you're simply filling up the send/rcv buffers on the network. While protocol A is asynchronous if the TCP buffers get full, things will run at speeds similar to synchronous replication. We need to wait for the peer to ack the packet before we can clear it from our send buffer and add another write. Because of this, while asynchronous, we still don't expect performance much beyond what the network is capable of when trying to replicate a large volume of writes. We created DRBD-Proxy to work around this by providing gigantic buffers/cache, but even then when those buffers fill, things reduce to network speeds.

Regarding protocol B, it was developed more for academic purposes than anything. It doesn't have a real-world use case, and we don't suggest its use.

klsx0 commented 1 year ago

Hello,

The speed of my storage without drbd is about 220MB/s. My network is approximately 25MB/s and with drbd on, it's about 23MB/, so we are nearing the network limit.

Regarding Drbd-proxy all of the links on this page redirected to linstor-server repo. Do we need a special license for this product or is it now a linstor-server component?

I will try to increase TCP buffers with mode A like you said.

Thank for you response,

dvance commented 1 year ago

The "star us on GitHub" link you followed is on many pages of the sites and is just a generic request for people to star the project on GitHub for some internet points. The only other really pertinent link there is the Disaster-Recovery link as that's the common use-case for DRBD-Proxy.

You will not find DRBD-Proxy on GitHub as it is a licensed product. You must contact sales regarding an eval license to try it out.

Please try experimenting with increasing the TCP buffers, but just know that it will only protect against short bursts of writes. With disks capable of 220MiB/s and a network of 25MiB/s a stream of constant writes will eventually fill the buffer no matter the size. Perhaps your application write workloads are only "bursty" though.

klsx0 commented 1 year ago

I tried drbd by increasing the threshold of the TCP buffers using mode A, I got 32 MB/S but no better. In my case I have a lot of IO due to a database and this really slows down the system.

Thank you, I will contact the sales team for a trial version of Drbd proxy.