LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

Troubles while using DRBD 9.2.4 #64

Closed SAkagiI closed 1 year ago

SAkagiI commented 1 year ago

I had three troubles while using DRBD 9.2.4. It is probably the bug of DRBD, but please understand that it could also be the bug of OS(Almalinux) or Software(Bind or Squid).

System Environment

1. Replacing device files

State

Definition of /dev/sda and /dev/sdb was replaced, and then the mount of DRBD using Pacemaker was failed.

ex. Normal

/dev/sda |- /dev/sda1 |- /dev/sda2  |-/dev/mapper/almalinux-root  |-/dev/mapper/almalinux-swap  |-/dev/mapper/almalinux-home /dev/sdb |- /dev/sdb1  |- /dev/drbd0 /dev/sr0

ex. Abnormal

/dev/sda |- /dev/sda1 /dev/sdb |- /dev/sdb1 |- /dev/sdb2  |-/dev/mapper/almalinux-root  |-/dev/mapper/almalinux-swap  |-/dev/mapper/almalinux-home /dev/sr0

Temporary approach

It may be due to udev-always-use-vnr option(specification?) being enabled by default in global_common.conf, I commented it out.

2. Not recognizing the service user

State

The mount point of DRBD was /mnt and chown named:named /mnt. /mnt owner and group permissions were changed numerically, the service user was not recognized, and then files in /mnt were no longer able to read or write by named service.

ex. Normal

drwxrwxr-x 2 named named … mnt

ex. Abnormal

drwxrwxr-x 2 25 25 … mnt

Temporary approach

Changed to use chmod instead of chown. There was no problem that root user was not recognized, so specifying Linux default user(nobody, etc.) in chown might be fine.

3. Loss of all data in mount point

State

The mount point of DRBD was /mnt and I was running the squid -k rotate command several times in /mnt, "Terminated" was displayed and all files in /mnt were deleted.

Temporary approach

Backup files in /mnt.

dvance commented 1 year ago

1. Replacing device files

This is not an issue with DRBD. It is the kernel that enumerates the block devices. If you want to avoid this in the future you can use udev symlinks such as /dev/disk/by-uuid/ within your DRBD configuration or just simply use LVM.

2. Not recognizing the service user

Again, this is not an issue with DRBD. I suspect the named user does not have the same uid and gid on all nodes. You need to make sure the uid and gid of users match across the cluster nodes for things to work properly.

3. Loss of all data in mount point

We need more information here. The data was likely not deleted, or if it was, something other than DRBD deleted it. My best guess here is that DRBD was in some degraded state, or you were running your commands on the primary node when a failover occured.

kermat commented 1 year ago

Also, worth mentioning that we have a community Slack channel as well as a DRBD user mailing list where configuration and use-case specific questions like this can be asked.