LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
982 stars 76 forks source link

Linstor exception when managing drive with replica on overfilled node #156

Open kvaps opened 4 years ago

kvaps commented 4 years ago

Hi, to reproduce this issue I created 5 VMs and deployed linstor on them:

add-apt-repository ppa:linbit/linbit-drbd9-stack -y

# controller
apt install -y linstor-controller linstor-satellite drbd-dkms linstor-client

# satellites
apt install -y linstor-satellite drbd-dkms

versions are:

# cat /proc/drbd 
version: 9.0.23-1 (api:2/proto:86-116)
GIT-hash: d16bfab7a4033024fed2d99d3b179aa6bb6eb300 build by root@linstor-dev-2, 2020-06-16 13:36:32
Transports (api:16): tcp (9.0.23-1)

# linstor c v
linstor controller 1.7.1; GIT-hash: 6760637d6fae7a5862103ced4ea0ab0a758861f9

# cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
...

# uname -a
Linux linstor-dev-1 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I've created 5 nodes and 5G thinlvm storage pools on them:

linstor n c linstor-dev-1
linstor n c linstor-dev-2
linstor n c linstor-dev-3
linstor n c linstor-dev-4
linstor n c linstor-dev-5

linstor ps cdp  LVMTHIN linstor-dev-1 /dev/vdb --pool-name thindata --storage-pool thindata
linstor ps cdp  LVMTHIN linstor-dev-2 /dev/vdb --pool-name thindata --storage-pool thindata
linstor ps cdp  LVMTHIN linstor-dev-3 /dev/vdb --pool-name thindata --storage-pool thindata
linstor ps cdp  LVMTHIN linstor-dev-4 /dev/vdb --pool-name thindata --storage-pool thindata
linstor ps cdp  LVMTHIN linstor-dev-5 /dev/vdb --pool-name thindata --storage-pool thindata
# linstor sp l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node          ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ thindata             ┊ linstor-dev-1 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.88 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-2 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.88 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-3 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-4 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-5 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

then I created new resource group:

linstor rg c test --storage-pool thindata --place-count 2
linstor vg c test

And two drives:

linstor rd c drive1 --resource-group test
linstor rd c drive2 --resource-group test
linstor vd c drive1 10G
linstor vd c drive2 2G

Then located them like this:

linstor r c linstor-dev-1 drive1 -s thindata
linstor r c linstor-dev-1 drive2 -s thindata
linstor r c linstor-dev-2 drive2 -s thindata
linstor r c linstor-dev-3 drive2 -s DfltDisklessStorPool

on node3 I started writing on it some information (slowly):

root@linstor-dev-3:~# mkfs.ext4 /dev/drbd1001 
root@linstor-dev-3:~# mount /dev/drbd1001 /mnt/
root@linstor-dev-3:~# touch /mnt/test
root@linstor-dev-3:~# while [ $(du -bs /mnt/test | cut -d$'\t' -f1) -lt 1073741824 ]; do echo $((i++)) >> /mnt/test; done

on node1 I started filling the big drive by zeroes:

root@linstor-dev-1:~# dd if=/dev/zero of=/dev/drbd1000 bs=16k status=progress

After dd sent all the 10 gigs, expectedly it trows an error:

dd: error writing '/dev/drbd1000': No space left on device
655474+0 records in
655473+0 records out
10739277824 bytes (11 GB, 10 GiB) copied, 243.776 s, 44.1 MB/s

because 5G pool was completely overfilled:

# linstor sp l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node          ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ thindata             ┊ linstor-dev-1 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊        0 KiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-2 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.88 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-3 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-4 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
┊ thindata             ┊ linstor-dev-5 ┊ LVM_THIN ┊ linstor_thindata/thindata ┊     4.98 GiB ┊      4.98 GiB ┊ True         ┊ Ok    ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

But slow writer was continue working, only first replica of it's drive become to diskless:

# linstor v l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊    State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-1 ┊ drive1   ┊ thindata             ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   4.88 GiB ┊ Unused ┊ Diskless ┊
┊ linstor-dev-1 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 108.55 MiB ┊ Unused ┊ UpToDate ┊
┊ linstor-dev-2 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 108.55 MiB ┊ Unused ┊ UpToDate ┊
┊ linstor-dev-3 ┊ drive2   ┊ DfltDisklessStorPool ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊            ┊ InUse  ┊ Diskless ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I was let it work for a while, then finished, unmounted and checked drbdadm status:

root@linstor-dev-3:/# umount /mnt
root@linstor-dev-3:/# drbdadm status 
drive2 role:Secondary
  disk:Diskless
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-2 role:Secondary
    peer-disk:UpToDate

Then I tried to remove the diskless repica:

# linstor r d linstor-dev-3 drive2
INFO:
    The given resource will not be deleted but will be taken over as a linstor managed tiebreaker resource.
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-3'
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-2'
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-1'

And attach it on another node, but something had gonna wrong:

root@linstor-dev-1:~# linstor r c linstor-dev-4 drive2 -s DfltDisklessStorPool
WARNING:
Description:
    Resource will be automatically flagged as drbd diskless
Cause:
    Used storage pool 'DfltDisklessStorPool' is diskless, but resource was not flagged drbd diskless
SUCCESS:
    Successfully set property key(s): StorPoolName
INFO:
    Tie breaker marked for deletion
SUCCESS:
Description:
    New resource 'drive2' on node 'linstor-dev-4' registered.
Details:
    Resource 'drive2' on node 'linstor-dev-4' UUID is: b8629c65-5797-4b1f-abad-1c7574dde3fe
SUCCESS:
Description:
    Volume with number '0' on resource 'drive2' on node 'linstor-dev-4' successfully registered
Details:
    Volume UUID is: 6d3291fd-31ba-418d-858b-5fb86ef2909b
SUCCESS:
    Created resource 'drive2' on 'linstor-dev-4'
ERROR:
Description:
    (Node: 'linstor-dev-3') Shutdown of the DRBD resource 'drive2 failed
Cause:
    The external command for stopping the DRBD resource failed
Correction:
    - Check whether the required software is installed
    - Check whether the application's search path includes the location
      of the external software
    - Check whether the application has execute permission for the external command
Show reports:
    linstor error-reports show 5EE8CBBA-D519A-000000
ERROR:
    (Node: 'linstor-dev-2') Failed to adjust DRBD resource drive2
Show reports:
    linstor error-reports show 5EE8CBEB-DC80A-000000
SUCCESS:
    Added peer(s) 'linstor-dev-4' to resource 'drive2' on 'linstor-dev-1'
# linstor r l 
╭─────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns                     ┊        State ┊
╞═════════════════════════════════════════════════════════════════════════════════════════╡
┊ drive1       ┊ linstor-dev-1 ┊ 7001 ┊ Unused ┊ Ok                        ┊     Diskless ┊
┊ drive2       ┊ linstor-dev-1 ┊ 7002 ┊ Unused ┊ Ok                        ┊ Inconsistent ┊
┊ drive2       ┊ linstor-dev-2 ┊ 7002 ┊ Unused ┊ Ok                        ┊     UpToDate ┊
┊ drive2       ┊ linstor-dev-4 ┊ 7002 ┊ Unused ┊ Connecting(linstor-dev-2) ┊     Diskless ┊
╰─────────────────────────────────────────────────────────────────────────────────────────╯

root@linstor-dev-3:/# drbdadm status 
drive2 role:Secondary
  disk:Diskless
  linstor-dev-2 role:Secondary
    peer-disk:UpToDate

root@linstor-dev-4:~# drbdadm status
drive2 role:Secondary
  disk:Diskless quorum:no
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-2 connection:Connecting

root@linstor-dev-2:~# drbdadm status
drive2 role:Secondary
  disk:UpToDate
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-3 role:Secondary
    peer-disk:Diskless

5EE8CBBA-D519A-000000.log 5EE8CBEB-DC80A-000000.log drive2-on-linstor-dev-2.txt drive2-on-linstor-dev-4.txt

After that I also tried to mount small drive on alive diskful node:

root@linstor-dev-2:~# mount /dev/drbd1001 /mnt/

but command was stuck, even:

root@linstor-dev-2:~# drbdadm primary drive2

was stuck

ghernadi commented 4 years ago

can you attach the dmesg output of all 4 satellites of the given timeframe?

kvaps commented 4 years ago

Sure, I'm going to repeat this today

On Wed, Jun 17, 2020, 07:40 ghernadi notifications@github.com wrote:

can you attach the dmesg output of all 4 satellites of the given timeframe?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LINBIT/linstor-server/issues/156#issuecomment-645163565, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUY6JOE7IPHTNJ74ODWNLRXBJM3ANCNFSM4N7XBZ2Q .

kvaps commented 4 years ago

Okay I just tried to repeat the test:

Wed Jun 17 09:09:47 UTC 2020 - replicase are created in linstor
Wed Jun 17 09:10:53 UTC 2020 - small drive mounted on linstor-dev-3, staring slow writer
Wed Jun 17 09:11:49 UTC 2020 - starting dd on big drive on linstor-dev-1
Wed Jun 17 09:14:14 UTC 2020 - slow writer paused writing
Wed Jun 17 09:14:47 UTC 2020 - dd finished with error, slow writer continued working
Wed Jun 17 09:16:47 UTC 2020 - I finished slow writer
Wed Jun 17 09:18:45 UTC 2020 - umount small drive from linstor-dev-3
Wed Jun 17 09:19:41 UTC 2020 - delete diskless replica of small drive from linstor-dev-3
Wed Jun 17 09:20:09 UTC 2020 - diskless replica for small drive on linstor-dev-4
Wed Jun 17 09:21:34 UTC 2020 - try mount small drive on linstor-dev-4
Wed Jun 17 09:22:58 UTC 2020 - try mount small drive on linstor-dev-2

linstor-dev-1-dmesg.log linstor-dev-2-dmesg.log linstor-dev-3-dmesg.log linstor-dev-4-dmesg.log

This time was small difference, when storagepool on linstor-dev-1 was overfilled the slow writer stoped working for drive2 until the fast writer (dd on drive) finished it's work and return the error, after that both drives become to diskless on linstor-dev-1, and slow writer contained working:

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-1 ┊ drive1   ┊ thindata             ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  4.89 GiB ┊ Unused ┊ Diskless ┊
┊ linstor-dev-1 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.50 MiB ┊ Unused ┊ Diskless ┊
┊ linstor-dev-2 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.70 MiB ┊ Unused ┊ UpToDate ┊
┊ linstor-dev-3 ┊ drive2   ┊ DfltDisklessStorPool ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊           ┊ InUse  ┊ Diskless ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

After that I also stopped it and unmounted the drive, then successfully removed it:

# linstor r d linstor-dev-3 drive2
INFO:
    The given resource will not be deleted but will be taken over as a linstor managed tiebreaker resource.
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-3'
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-2'
SUCCESS:
    Resource 'drive2' updated on node 'linstor-dev-1'

But creating new replica on linstor-dev-4 still was fail:

# linstor r c linstor-dev-4 drive2 -s DfltDisklessStorPool
WARNING:
Description:
    Resource will be automatically flagged as drbd diskless
Cause:
    Used storage pool 'DfltDisklessStorPool' is diskless, but resource was not flagged drbd diskless
SUCCESS:
    Successfully set property key(s): StorPoolName
INFO:
    Tie breaker marked for deletion
SUCCESS:
Description:
    New resource 'drive2' on node 'linstor-dev-4' registered.
Details:
    Resource 'drive2' on node 'linstor-dev-4' UUID is: cfaa3820-654b-42bb-bf7b-8e0340c60b84
SUCCESS:
Description:
    Volume with number '0' on resource 'drive2' on node 'linstor-dev-4' successfully registered
Details:
    Volume UUID is: c930e77e-18a3-4fda-97a5-b73c4ffdaff6
SUCCESS:
    Created resource 'drive2' on 'linstor-dev-4'
ERROR:
Description:
    (Node: 'linstor-dev-3') Shutdown of the DRBD resource 'drive2 failed
Cause:
    The external command for stopping the DRBD resource failed
Correction:
    - Check whether the required software is installed
    - Check whether the application's search path includes the location
      of the external software
    - Check whether the application has execute permission for the external command
Show reports:
    linstor error-reports show 5EE9DCF0-D519A-000000
ERROR:
    (Node: 'linstor-dev-2') Failed to adjust DRBD resource drive2
Show reports:
    linstor error-reports show 5EE9DCF2-DC80A-000000
SUCCESS:
    Added peer(s) 'linstor-dev-4' to resource 'drive2' on 'linstor-dev-1'

5EE9DCF0-D519A-000000.log 5EE9DCF2-DC80A-000000.log

# linstor v l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊        State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-1 ┊ drive1   ┊ thindata             ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  4.89 GiB ┊ Unused ┊     Diskless ┊
┊ linstor-dev-1 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.50 MiB ┊ Unused ┊ Inconsistent ┊
┊ linstor-dev-2 ┊ drive2   ┊ thindata             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.91 MiB ┊ Unused ┊     UpToDate ┊
┊ linstor-dev-4 ┊ drive2   ┊ DfltDisklessStorPool ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊           ┊ Unused ┊     Diskless ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

the mounting it on linstor-dev-4 was also failed:

root@linstor-dev-4:~# mount /dev/drbd1001 /mnt/
damount: /mnt: mount(2) system call failed: No data available.

root@linstor-dev-4:~# drbdadm status
drive2 role:Secondary
  disk:Diskless quorum:no
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-2 connection:Connecting

but on the linstor-dev-2 it was stuck:

root@linstor-dev-2:~# mount /dev/drbd1001 /mnt/
<stuck>
root@linstor-dev-2:~# drbdadm status 
drive2 role:Secondary
  disk:UpToDate
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-3 role:Secondary
    peer-disk:Diskless

After that I left to lunch, when I came back I found that it was mounted in readonly mode:

root@linstor-dev-2:~# mount /dev/drbd1001 /mnt/
mount: /mnt: WARNING: device write-protected, mounted read-only.

but drbdadm status show that it is secondary:

root@linstor-dev-2:~# drbdadm status 
drive2 role:Secondary
  disk:UpToDate
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-3 role:Secondary
    peer-disk:Diskless

The most interesting part is that I was trying to remount it in readwrite mode:

root@linstor-dev-2:~# mount -o remount,rw /mnt

and I was succeful, so I can write something on it, but it is still shown as secondary:

root@linstor-dev-2:~# echo asdasd > /mnt/fffff
root@linstor-dev-2:~# drbdadm status 
drive2 role:Secondary
  disk:UpToDate
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-3 role:Secondary
    peer-disk:Disklessroot@linstor-dev-2:~# cat /proc/mounts | grep /mnt
/dev/drbd1001 /mnt ext4 rw,relatime 0 0

thus right now I have a drbd drive mounted as Secondary and I can write on it

root@linstor-dev-1:~# linstor r l
╭─────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns                     ┊    State ┊
╞═════════════════════════════════════════════════════════════════════════════════════╡
┊ drive1       ┊ linstor-dev-1 ┊ 7000 ┊ Unused ┊ Ok                        ┊ Diskless ┊
┊ drive2       ┊ linstor-dev-1 ┊ 7001 ┊ Unused ┊ Ok                        ┊ Diskless ┊
┊ drive2       ┊ linstor-dev-2 ┊ 7001 ┊ Unused ┊ Ok                        ┊ UpToDate ┊
┊ drive2       ┊ linstor-dev-4 ┊ 7001 ┊ Unused ┊ Connecting(linstor-dev-2) ┊ Diskless ┊
╰─────────────────────────────────────────────────────────────────────────────────────╯

linstor-dev-2-drive2.res.txt

kvaps commented 4 years ago

I also tried to remove big drive, then mount small one on instor-dev-2, but it was mounted the same way in readonly, after that I remounted it to readwrite mode, and created new file on it, then unmounted and disconnected on linstor-dev-1:

drbdadm disconnect drive2
drbdsetup resource-options --quorum=off drive2
drbdadm primary drive2 --force

after that I mounted checked the data, both files were exising there, so I unmounted and run ajust for it, finaly it become to inconsistent state on linstor-dev-2.

Then I tried to stop mounted secondary on linstor-dev-2

# drbdadm down drive2 
drive2: State change failed: (-10) State change was refused by peer node
additional info from kernel:
Declined by peer linstor-dev-4 (id: 3), see the kernel log there
Command 'drbdsetup down drive2' terminated with exit code 11

dmesg:

[ 8834.391680] drbd drive2: Preparing cluster-wide state change 1166730599 (1->3 496/16)
[ 8836.404644] drbd drive2: Declined by peer linstor-dev-4 (id: 3), see the kernel log there
[ 8836.405551] drbd drive2: Aborting cluster-wide state change 1166730599 (2012ms) rv = -10
# drbdadm disconnect drive2 --force
drive2: Failure: (162) Invalid configuration request
additional info from kernel:
unknown connection
Command 'drbdsetup disconnect drive2 0 --force' terminated with exit code 10
kvaps commented 4 years ago

Ok I just found another bug. Finally, I removed all the resources and rebooted all the nodes, then created them again but this time only on two nodes:

(11:48:02) linstor-dev-1 # linstor r c linstor-dev-1 drive1 -s thindata
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'drive1' on node 'linstor-dev-1' registered.
Details:
    Resource 'drive1' on node 'linstor-dev-1' UUID is: 9b03e524-718b-4cb1-a01d-bda581728f06
SUCCESS:
Description:
    Volume with number '0' on resource 'drive1' on node 'linstor-dev-1' successfully registered
Details:
    Volume UUID is: fb372fe1-cc14-42f2-8a4c-ef84df3d65b8
SUCCESS:
    Created resource 'drive1' on 'linstor-dev-1'
SUCCESS:
Description:
    Resource 'drive1' on 'linstor-dev-1' ready
Details:
    Node(s): 'linstor-dev-1', Resource: 'drive1'

(11:48:06) linstor-dev-1 # linstor r c linstor-dev-1 drive2 -s thindata
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'drive2' on node 'linstor-dev-1' registered.
Details:
    Resource 'drive2' on node 'linstor-dev-1' UUID is: df3089f8-dfa1-4627-a370-5e88b58f6998
SUCCESS:
Description:
    Volume with number '0' on resource 'drive2' on node 'linstor-dev-1' successfully registered
Details:
    Volume UUID is: 6f6ea7d2-3d67-4840-9b9c-f8e449370a0e
SUCCESS:
    Created resource 'drive2' on 'linstor-dev-1'
SUCCESS:
Description:
    Resource 'drive2' on 'linstor-dev-1' ready
Details:
    Node(s): 'linstor-dev-1', Resource: 'drive2'

(11:48:07) linstor-dev-1 # linstor r c linstor-dev-2 drive2 -s thindata
SUCCESS:
    Successfully set property key(s): StorPoolName
INFO:
    Tie breaker resource 'drive2' created on linstor-dev-3
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'off' to 'majority' by auto-quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'off' to 'io-error' by auto-quorum
SUCCESS:
Description:
    New resource 'drive2' on node 'linstor-dev-2' registered.
Details:
    Resource 'drive2' on node 'linstor-dev-2' UUID is: e9262a6a-6ca7-4655-8f2e-2973d9797d1b
SUCCESS:
Description:
    Volume with number '0' on resource 'drive2' on node 'linstor-dev-2' successfully registered
Details:
    Volume UUID is: 64aa1c8e-804c-4c62-8187-23d9b58631ab
SUCCESS:
    Added peer(s) 'linstor-dev-2' to resource 'drive2' on 'linstor-dev-3'
SUCCESS:
    Added peer(s) 'linstor-dev-2' to resource 'drive2' on 'linstor-dev-1'
SUCCESS:
    Created resource 'drive2' on 'linstor-dev-2'
SUCCESS:
Description:
    Resource 'drive2' on 'linstor-dev-2' ready
Details:
    Node(s): 'linstor-dev-2', Resource: 'drive2'
SUCCESS:
    Created resource 'drive2' on 'linstor-dev-3'
SUCCESS:
    Added peer(s) 'linstor-dev-3' to resource 'drive2' on 'linstor-dev-1'
SUCCESS:
    Added peer(s) 'linstor-dev-3' to resource 'drive2' on 'linstor-dev-2'
SUCCESS:
Description:
    Resource 'drive2' on 'linstor-dev-3' ready
Details:
    Node(s): 'linstor-dev-2', Resource: 'drive2'

then I tried to make fs on linstor-dev-2:

(11:48:52) linstor-dev-2 # mkfs.ext4 /dev/drbd1001
mke2fs 1.45.5 (07-Jan-2020)
^C
(11:50:24) linstor-dev-2 # mkfs.ext4 /dev/drbd1001
mke2fs 1.45.5 (07-Jan-2020)
/dev/drbd1001: Read-only file system while setting up superblock

linstor-dev-1-dmesg.log linstor-dev-2-dmesg.log linstor-dev-3-dmesg.log

same error?

(11:50:19) linstor-dev-1 # linstor r l -a
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊      State ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ drive1       ┊ linstor-dev-1 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊
┊ drive2       ┊ linstor-dev-1 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊
┊ drive2       ┊ linstor-dev-2 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊
┊ drive2       ┊ linstor-dev-3 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊
╰───────────────────────────────────────────────────────────────────╯

Also removing of tiebreaker was failed:

(11:59:08) linstor-dev-1 # linstor r d linstor-dev-3 drive2
INFO:
    Disabling auto-tiebreaker on resource-definition 'drive2' as tiebreaker resource was manually deleted
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' was removed as there are not enough resources for quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' was removed as there are not enough resources for quorum
SUCCESS:
Description:
    Node: linstor-dev-3, Resource: drive2 marked for deletion.
Details:
    Node: linstor-dev-3, Resource: drive2 UUID is: 399876b3-6be7-4ac1-b022-7a9a96b7c38a
SUCCESS:
    Notified 'linstor-dev-1' that 'drive2' is being deleted on Node(s): [linstor-dev-3]
ERROR:
Description:
    (Node: 'linstor-dev-3') Shutdown of the DRBD resource 'drive2 failed
Cause:
    The external command for stopping the DRBD resource failed
Correction:
    - Check whether the required software is installed
    - Check whether the application's search path includes the location
      of the external software
    - Check whether the application has execute permission for the external command
Show reports:
    linstor error-reports show 5EEA01FE-D519A-000000
ERROR:
    (Node: 'linstor-dev-2') Failed to adjust DRBD resource drive2
Show reports:
    linstor error-reports show 5EEA01FA-DC80A-000000

5EEA01FE-D519A-000000.log 5EEA01FA-DC80A-000000.log

linstor-dev-1-dmesg.log linstor-dev-2-dmesg.log linstor-dev-3-dmesg.log


also removing procedure were somehow weird:

(12:04:23) linstor-dev-1 # linstor rd d drive2
SUCCESS:
Description:
    Resource definition 'drive2' marked for deletion.
Details:
    Resource definition 'drive2' UUID is: ceb7c124-daf0-4ab7-9f9d-668997e2369e
SUCCESS:
    Notified 'linstor-dev-1' that diskless resources of 'drive2' are being deleted
ERROR:
Description:
    (Node: 'linstor-dev-3') Shutdown of the DRBD resource 'drive2 failed
Cause:
    The external command for stopping the DRBD resource failed
Correction:
    - Check whether the required software is installed
    - Check whether the application's search path includes the location
      of the external software
    - Check whether the application has execute permission for the external command
Show reports:
    linstor error-reports show 5EEA01FE-D519A-000005
ERROR:
    (Node: 'linstor-dev-2') Failed to adjust DRBD resource drive2
Show reports:
    linstor error-reports show 5EEA01FA-DC80A-000005

(12:04:26) linstor-dev-1 # drbdadm status
drive2 role:Secondary
  disk:UpToDate
  linstor-dev-2 role:Secondary
    peer-disk:UpToDate

5EEA01FE-D519A-000005.log 5EEA01FA-DC80A-000005.log

then I rebooted all five nodes, and tried again:

(12:05:00) linstor-dev-1 # drbdadm status
# No currently configured DRBD found.
(12:05:04) linstor-dev-1 # linstor rd l
╭────────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State    ┊
╞════════════════════════════════════════════════╡
┊ drive2       ┊ 7001 ┊ test          ┊ DELETING ┊
╰────────────────────────────────────────────────╯
(12:05:08) linstor-dev-1 # linstor rd d drive2
SUCCESS:
Description:
    Resource definition 'drive2' marked for deletion.
Details:
    Resource definition 'drive2' UUID is: ceb7c124-daf0-4ab7-9f9d-668997e2369e
WARNING:
Description:
    No active connection to satellite 'linstor-dev-3'
Details:
    The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
SUCCESS:
    Notified 'linstor-dev-2' that diskless resources of 'drive2' are being deleted
SUCCESS:
    Notified 'linstor-dev-1' that diskless resources of 'drive2' are being deleted

(12:05:12) linstor-dev-1 # linstor r l
╭─────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊    State ┊
╞═════════════════════════════════════════════════════════════════╡
┊ drive2       ┊ linstor-dev-1 ┊ 7001 ┊ Unused ┊ Ok    ┊ UpToDate ┊
┊ drive2       ┊ linstor-dev-2 ┊ 7001 ┊ Unused ┊ Ok    ┊ UpToDate ┊
╰─────────────────────────────────────────────────────────────────╯

linstor-dev-1-dmesg.log linstor-dev-2-dmesg.log linstor-dev-3-dmesg.log

but after a short period the drive2 was successfully deleted.

kvaps commented 4 years ago

The last bug can be simple reproduced, I have clean linstor with just thinlvm pools created, then I do:

linstor rd c drive3 --resource-group test
linstor rd c drive4 --resource-group test
linstor vd c drive3 10G
linstor vd c drive4 2G

linstor r c linstor-dev-5 drive3 -s thindata # (12:28:28)
linstor r c linstor-dev-5 drive4 -s thindata # (12:28:29) 
linstor r c linstor-dev-4 drive4 -s thindata # (12:28:30) 

Afterwards you can try use drive on linstor-dev-4 it will be unusable:

(12:29:25) linstor-dev-4 # mkfs.ext4 /dev/drbd1001 
mke2fs 1.45.5 (07-Jan-2020)
/dev/drbd1001: Read-only file system while setting up superblock

(12:33:43) linstor-dev-4 # drbdadm status
drive4 role:Secondary
  disk:UpToDate
  linstor-dev-1 role:Secondary
    peer-disk:Diskless
  linstor-dev-5 role:Secondary
    peer-disk:UpToDate

linstor-dev-4-dmesg.log linstor-dev-5-dmesg.log linstor-dev-1-dmesg.log

should I report it to drbd-user@lists.linbit.com?

kvaps commented 4 years ago

Ok another bug:

(12:41:06) linstor-dev-1 # linstor c sp DrbdOptions/auto-add-quorum-tiebreaker False
(12:41:29) linstor-dev-1 # linstor rd c drive3 --resource-group test
(12:41:42) linstor-dev-1 # linstor rd c drive4 --resource-group test
(12:41:42) linstor-dev-1 # linstor vd c drive3 10G
(12:41:43) linstor-dev-1 # linstor vd c drive4 2G
(12:41:48) linstor-dev-1 # linstor r c linstor-dev-5 drive3 -s thindata
(12:41:49) linstor-dev-1 # linstor r c linstor-dev-5 drive4 -s thindata
(12:41:50) linstor-dev-1 # linstor r c linstor-dev-4 drive4 -s thindata

(12:43:30) linstor-dev-4 # mkfs.ext4 /dev/drbd1001 
(12:47:05) linstor-dev-4 # mount /dev/drbd1001 /mnt/

(12:47:34) linstor-dev-4 # touch /mnt/test
(12:47:35) linstor-dev-4 # while [ $(du -bs /mnt/test | cut -d$'\t' -f1) -lt 1073741824 ]; do echo $((i++)) >> /mnt/test; done

(12:48:06) linstor-dev-5 # dd if=/dev/zero of=/dev/drbd1000 bs=16k status=progress

# Wed Jun 17 12:50:20 UTC 2020 - io stopped for drive4
# Wed Jun 17 12:51:15 UTC 2020 - dd returned error, io continued for drive4

(12:51:15) linstor-dev-1 # linstor v l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-5 ┊ drive3   ┊ thindata    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  4.89 GiB ┊ Unused ┊ Diskless ┊
┊ linstor-dev-4 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.70 MiB ┊ InUse  ┊ UpToDate ┊
┊ linstor-dev-5 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.50 MiB ┊ Unused ┊ Diskless ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

# Wed Jun 17 12:52:58 UTC 2020 - finished test for drive4

(12:53:25) linstor-dev-1 # linstor rd d drive3

(12:54:19) linstor-dev-4 # touch /mnt/test2
(12:54:22) linstor-dev-4 # while [ $(du -bs /mnt/test2 | cut -d$'\t' -f1) -lt 1073741824 ]; do echo $((i++)) >> /mnt/test2; done

(12:53:57) linstor-dev-1 # linstor v l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-4 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.91 MiB ┊ InUse  ┊ UpToDate ┊
┊ linstor-dev-5 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.50 MiB ┊ Unused ┊ Diskless ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(12:54:01) linstor-dev-1 # linstor v l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-4 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.91 MiB ┊ InUse  ┊ UpToDate ┊
┊ linstor-dev-5 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  1.67 GiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(12:54:45) linstor-dev-5 # drbdadm adjust drive4
Marked additional 2055 MB as out-of-sync based on AL.

(12:55:00) linstor-dev-5 # drbdadm status
drive4 role:Secondary
  disk:Inconsistent
  linstor-dev-4 role:Primary
    replication:SyncTarget peer-disk:UpToDate done:29.25

# Wed Jun 17 12:55:45 UTC 2020 - second test stopped

(12:57:16) linstor-dev-4 # umount /mnt

(12:57:21) linstor-dev-4 # fsck.ext4 /dev/drbd1001
e2fsck 1.45.5 (07-Jan-2020)
/dev/drbd1001: clean, 13/131376 files, 26353/525190 blocks

(13:03:16) linstor-dev-1 # linstor v l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node          ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-dev-4 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 98.91 MiB ┊ Unused ┊ UpToDate ┊
┊ linstor-dev-5 ┊ drive4   ┊ thindata    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  1.67 GiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Do you see that drive4 on linstor-dev-5 containing 1.67 GiB but on linstor-dev-4 containing just 98.91 MiB both are UpToDate?

linstor-dev-4-dmesg.log linstor-dev-5-dmesg.log