LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
954 stars 76 forks source link

LINSTOR shouldn't allow deleting last active peer #130

Closed kvaps closed 2 years ago

kvaps commented 4 years ago

Steps to reproduce:

# create resource-definition and volume-definition
linstor rd c testz
linstor vd c testz 100G

# create data-resource
linstor r c pve1 testz -s pve

# create diskless-resource
linstor r c pve3 testz -s DfltDisklessPool

# start workload on diskless
ssh pve3 dd if=/dev/urandom of=/dev/drbd/by-res/testz/0

# create another data-resource and remove the first one
linstor r c pve2 testz -s pve
linstor r d pve2

Now drbd cluster become to non-operable state:

# linstor r l -r testz
+----------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage  | Conns            |             State |
|============================================================================|
| testz        | pve1 | 7440 |        | Connecting(pve2) |          DELETING |
| testz        | pve2 | 7440 | Unused | Ok               | SyncTarget(1.02%) |
| testz        | pve3 | 7440 | InUse  | Ok               |          Diskless |
+----------------------------------------------------------------------------+
root@pve1:~# drbdadm status testz
testz role:Secondary
  disk:UpToDate
  pve2 connection:Connecting
  pve3 role:Primary
    peer-disk:Diskless

root@pve2:~# drbdadm status testz
testz role:Secondary
  disk:Inconsistent
  pve3 role:Primary
    peer-disk:Diskless

root@pve3:~# drbdadm status testz
testz role:Primary
  disk:Diskless
  pve1 role:Secondary
    peer-disk:UpToDate
  pve2 role:Secondary
    peer-disk:Inconsistent

As we can see, pve3 still working from pve1 since pve2 is not fully synced, pve2 knows nothing about pve1, but we need to finish syncing data from it.

Recovery steps (or how to fix Inconsistent / Outdated resources):

make backup of /var/lib/linstor.d/testz.res:

cp /var/lib/linstor.d/testz.res /tmp/testz.res

open /var/lib/linstor.d/testz.res on on other node and copy missing sections to pve2:

--- /tmp/testz.res
+++ /var/lib/linstor.d/testz.res
@@ -1,55 +1,77 @@
 # This file was generated by linstor(1.6.1), do not edit manually.

 resource "testz"
 {
     template-file "linstor_common.conf";

     options
     {
         quorum off;
     }

     net
     {
         cram-hmac-alg     sha1;
         shared-secret     "FMkOCXJpzC3DpvhEKAu5";
     }

+    on pve1
+    {
+        volume 0
+        {
+            disk        none;
+            disk
+            {
+                discard-zeroes-if-aligned yes;
+                rs-discard-granularity 8192;
+            }
+            meta-disk   internal;
+            device      minor 1432;
+        }
+        node-id    0;
+    }
+    
     on pve2
     {
         volume 0
         {
             disk        /dev/pve/testz_00000;
             disk
             {
                 discard-zeroes-if-aligned yes;
                 rs-discard-granularity 8192;
             }
             meta-disk   internal;
             device      minor 1432;
         }
         node-id    3;
     }

     on pve3
     {
         volume 0
         {
             disk        none;
             disk
             {
                 discard-zeroes-if-aligned yes;
                 rs-discard-granularity 8192;
             }
             meta-disk   internal;
             device      minor 1432;
         }
         node-id    2;
     }

+    connection
+    {
+        host pve1 address ipv4 10.29.36.159:7440;
+        host pve2 address ipv4 10.29.36.160:7440;
+    }
+    
     connection
     {
         host pve2 address ipv4 10.29.36.160:7440;
         host pve3 address ipv4 10.29.36.161:7440;
     }
 }

Then adjust it:

root@pve2:~# drbdadm adjust testz

Wait until replication will be finished:

root@pve2:~# drbdadm status testz
testz role:Secondary
  disk:Inconsistent
  pve1 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:0.12
  pve3 role:Primary
    peer-disk:Diskless

Recover old config and adjust again:

root@pve2:~# mv /tmp/testz.res /var/lib/linstor.d/testz.res
root@pve2:~# drbdadm adjust testz
rp- commented 2 years ago

this has been already fixed