LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

Cannot write files into highly available NFS storage created with DRBD and Pacemake. (Permission denied error returned) #63

Closed ghevge closed 1 year ago

ghevge commented 1 year ago

I am trying to set up a highly available NFS storage with DRBD and Pacemake (first time doing this), on 2 Fedora 38 VMs.

My main guidance on this endeavor were these 2 docs: doc1 doc2

I've managed to start the pacemaker cluster and to mount the NFS shared folder on my hosts, but when I try to write something in that folder, I get a prmission denied error.

Changing the mount point permission to 666 or 777 doesn't help.

Any idea what could be wrong ?

My DRBD configs looks like this:

#> sudo vi /etc/drbd.d/global_common.conf 
global {
 usage-count  yes;
}
common {
 disk {
    no-disk-flushes;
    no-disk-barrier;
    c-fill-target 24M;
    c-max-rate   720M;
    c-plan-ahead    15;
    c-min-rate     4M;
  }
  net {
    protocol C;
    max-buffers            36k;
    sndbuf-size            1024k;
    rcvbuf-size            2048k;
  }
}

#> sudo vi /etc/drbd.d/ha_nfs.res

resource ha_nfs {
  device "/dev/drbd1003";
  disk "/dev/nfs/share";
  meta-disk internal;
  on server1.test {
    address 192.168.1.116:7789;
  }
  on server2.test {
    address 192.168.1.167:7789;
  }
}

the pacemaker config looks like this:

crm> configure edit
node 1: server1.test
node 2: server2.test
primitive p_drbd_attr ocf:linbit:drbd-attr
primitive p_drbd_ha_nfs ocf:linbit:drbd \
        params drbd_resource=ha_nfs \
        op monitor timeout=20s interval=21s role=Slave start-delay=12s \
        op monitor timeout=20s interval=20s role=Master start-delay=8s
primitive p_expfs_nfsshare_exports_HA exportfs \
        params clientspec="192.168.1.0/24" directory="/nfsshare/exports/HA" fsid=1003 unlock_on_stop=1 options="rw,mountpoint" \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=40s \
        op stop interval=0s timeout=120s
primitive p_fs_nfsshare_exports_HA Filesystem \
        params device="/dev/drbd1003" directory="/nfsshare/exports/HA" fstype=ext4 run_fsck=no \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=60s \
        op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver
primitive p_pb_block portblock \
        params action=block ip=192.168.1.101 portno=2049 protocol=tcp
primitive p_pb_unblock portblock \
        params action=unblock ip=192.168.1.101 portno=2049 tickle_dir="/srv/drbd-nfs/nfstest/.tickle" reset_local_on_unblock_stop=1 protocol=tcp \
        op monitor interval=10s timeout=20s start-delay=15s
primitive p_virtip IPaddr2 \
        params ip=192.168.1.101 cidr_netmask=32 \
        op monitor interval=1s timeout=40s start-delay=0s \
        op start interval=0s timeout=20s \
        op stop interval=0s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
        meta master-max=1 master-node-max=1 clone-node-max=1 clone-max=2 notify=true
clone c_drbd_attr p_drbd_attr
colocation co_ha_nfs inf: p_pb_block p_virtip ms_drbd_ha_nfs:Master p_fs_nfsshare_exports_HA p_expfs_nfsshare_exports_HA p_nfsserver p_pb_unblock
property cib-bootstrap-options: \
        have-watchdog=false \
        cluster-infrastructure=corosync \
        cluster-name=nfsCluster \
        stonith-enabled=false \
        no-quorum-policy=ignore

PCS sttatus output:

[bebe@server2 share]$ sudo pcs status
[sudo] password for bebe:
Cluster name: nfsCluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: server1.test (version 2.1.6-4.fc38-6fdc9deea29) - partition with quorum
  * Last updated: Thu Jul 13 08:50:34 2023 on server2.test
  * Last change:  Thu Jul 13 08:27:46 2023 by hacluster via crmd on server1.test
  * 2 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ server1.test server2.test ]

Full List of Resources:
  * p_virtip    (ocf::heartbeat:IPaddr2):        Started server2.test
  * p_expfs_nfsshare_exports_HA (ocf::heartbeat:exportfs):       Started server2.test
  * p_fs_nfsshare_exports_HA    (ocf::heartbeat:Filesystem):     Started server2.test
  * p_nfsserver (ocf::heartbeat:nfsserver):      Started server2.test
  * p_pb_block  (ocf::heartbeat:portblock):      Started server2.test
  * p_pb_unblock        (ocf::heartbeat:portblock):      Started server2.test
  * Clone Set: ms_drbd_ha_nfs [p_drbd_ha_nfs] (promotable):
    * Masters: [ server2.test ]
    * Slaves: [ server1.test ]
  * Clone Set: c_drbd_attr [p_drbd_attr]:
    * Started: [ server1.test server2.test ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
DRBD status output:

[bebe@server2 share]$ sudo drbdadm status ha_nfs
ha_nfs role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate
kermat commented 1 year ago

Hello @ghevge.

If you were able to create a filesystem on the DRBD device (which you were based on the output you've shared), then this isn't an issue with reading/writing to DRBD, this is an issue with filesystem/NFS permissions.

For things like this I would point you to join the LINBIT Slack community where users of DRBD (and Pacemaker) might be able to further assist you in troubleshooting your setup.

Cheers