LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
989 stars 76 forks source link

Is it possible to skip drbd device when blkid command is used in satelite? #214

Closed tnaganawa closed 3 years ago

tnaganawa commented 3 years ago

Hi, I'm testing PVC creation feature with 3 node kubernetes cluster with piraeus-operator.

After setting lvm.conf through this issue, parallel creation of PVC becomes much faster and I can create 10 PVC in parallel now :) https://github.com/piraeusdatastore/piraeus/pull/61

Having said that, when 20 PVC is created in parallel, it still returns bunch of Inconsistent volumes ..

root@piraeus-op-cs-controller-795c5f45cf-smr6k:/# linstor --controller 127.0.0.1 v list | grep -v UpToDate
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node                                             | Resource                                 | StoragePool          | VolNr | MinorNr | DeviceName    | Allocated | InUse  |        State |
|==========================================================================================================================================================================================|
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-0305ff13-9731-458a-b300-49199beb4746 | lvm-thick            |     0 |    1098 | /dev/drbd1098 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-0305ff13-9731-458a-b300-49199beb4746 | lvm-thick            |     0 |    1098 | /dev/drbd1098 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-2123de84-d5b7-456f-adb0-6a51bcc0acdf | lvm-thick            |     0 |    1101 | /dev/drbd1101 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-2123de84-d5b7-456f-adb0-6a51bcc0acdf | lvm-thick            |     0 |    1101 | /dev/drbd1101 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-278afe38-a0b3-4dc9-baf9-bc82d2c95b25 | lvm-thick            |     0 |    1102 | /dev/drbd1102 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-278afe38-a0b3-4dc9-baf9-bc82d2c95b25 | lvm-thick            |     0 |    1102 | /dev/drbd1102 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-51e83b42-6998-48ad-ba11-c12562524e80 | lvm-thick            |     0 |    1100 | /dev/drbd1100 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-51e83b42-6998-48ad-ba11-c12562524e80 | lvm-thick            |     0 |    1100 | /dev/drbd1100 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-54fa827d-7772-4339-acba-833e27c5c06c | lvm-thick            |     0 |    1103 | /dev/drbd1103 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-54fa827d-7772-4339-acba-833e27c5c06c | lvm-thick            |     0 |    1103 | /dev/drbd1103 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-5624b488-2e21-4e16-bd76-f4c911087767 | lvm-thick            |     0 |    1093 | /dev/drbd1093 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-5624b488-2e21-4e16-bd76-f4c911087767 | lvm-thick            |     0 |    1093 | /dev/drbd1093 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-5f803b53-aedf-4737-8374-7339014529ac | lvm-thick            |     0 |    1099 | /dev/drbd1099 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-5f803b53-aedf-4737-8374-7339014529ac | lvm-thick            |     0 |    1099 | /dev/drbd1099 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-6ce10f37-5d52-4b03-a814-f47ca23e470c | lvm-thick            |     0 |    1089 | /dev/drbd1089 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-6ce10f37-5d52-4b03-a814-f47ca23e470c | lvm-thick            |     0 |    1089 | /dev/drbd1089 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-7bb874c2-c199-43cd-932a-d922d6f87d61 | lvm-thick            |     0 |    1092 | /dev/drbd1092 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-7bb874c2-c199-43cd-932a-d922d6f87d61 | lvm-thick            |     0 |    1092 | /dev/drbd1092 |   204 MiB | Unused | Inconsistent |
| ip-172-31-36-188.ap-northeast-1.compute.internal | pvc-965ec0f0-9dcd-41ba-a87d-af7301ccf683 | lvm-thick            |     0 |    1096 | /dev/drbd1096 |   204 MiB | Unused | Inconsistent |
| ip-172-31-43-38.ap-northeast-1.compute.internal  | pvc-965ec0f0-9dcd-41ba-a87d-af7301ccf683 | lvm-thick            |     0 |    1096 | /dev/drbd1096 |   204 MiB | Unused | Inconsistent |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@piraeus-op-cs-controller-795c5f45cf-smr6k:/# 

Investigating this behavior, blkid -o device command is always running when 'linstor volume list' is slow, and this command also directly access /dev/drbdXXXX file, although it could be slow when DRBD state is 'Secondary'.

[root@ip-172-31-42-136 ~]# ps -ef | grep blkid
root       803 31093  1 14:58 ?        00:00:00 blkid -o device
root       825  8910  0 14:58 pts/0    00:00:00 grep --color=auto blkid
[root@ip-172-31-42-136 ~]# 
[root@ip-172-31-42-136 ~]# 
[root@ip-172-31-42-136 ~]# strace -p 803
strace: Process 803 attached
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1101", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1101), ...}) = 0
stat("/dev/drbd1101", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1101), ...}) = 0
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1101", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1101), ...}) = 0
access("/dev/drbd1101", F_OK)           = 0
stat("/dev/drbd1101", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1101), ...}) = 0
openat(AT_FDCWD, "/sys/dev/block/147:1101", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
openat(AT_FDCWD, "/dev/drbd1101", O_RDONLY|O_CLOEXEC) = -1 EMEDIUMTYPE (Wrong medium type)
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1102", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1102), ...}) = 0
stat("/dev/drbd1102", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1102), ...}) = 0
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1102", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1102), ...}) = 0
access("/dev/drbd1102", F_OK)           = 0
stat("/dev/drbd1102", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1102), ...}) = 0
openat(AT_FDCWD, "/sys/dev/block/147:1102", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
openat(AT_FDCWD, "/dev/drbd1102", O_RDONLY|O_CLOEXEC) = -1 EMEDIUMTYPE (Wrong medium type)
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1100", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1100), ...}) = 0
stat("/dev/drbd1100", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1100), ...}) = 0
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1100", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1100), ...}) = 0
access("/dev/drbd1100", F_OK)           = 0
stat("/dev/drbd1100", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1100), ...}) = 0
openat(AT_FDCWD, "/sys/dev/block/147:1100", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
openat(AT_FDCWD, "/dev/drbd1100", O_RDONLY|O_CLOEXEC) = -1 EMEDIUMTYPE (Wrong medium type)
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1103", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1103), ...}) = 0
stat("/dev/drbd1103", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1103), ...}) = 0
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=6060, ...}) = 0
lstat("/dev/drbd1103", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1103), ...}) = 0
access("/dev/drbd1103", F_OK)           = 0
stat("/dev/drbd1103", {st_mode=S_IFBLK|0660, st_rdev=makedev(147, 1103), ...}) = 0
openat(AT_FDCWD, "/sys/dev/block/147:1103", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
openat(AT_FDCWD, "/dev/drbd1103", O_RDONLY|O_CLOEXEC^Cstrace: Process 803 detached

To suppress this behavior, is it possible to skip blkid command when device major number is 147?

(pseudo code)
old:
 blkid -o device

new:
 for device_name major minor in $(lsblk -r -o NAME,MAJ:MIN)
 do
  if [[ ${major} == 147 ]]
  then
     :
  else
    blkid -o device /dev/${device_name}
  fi
 done

Best Regards, Tatsuya

tnaganawa commented 3 years ago

Hi, I noticed blkid check logic is disabled in test logic :), https://github.com/LINBIT/linstor-server/blob/master/satellite/src/test/java/com/linbit/linstor/layer/storage/utils/LsBlkUtilsTest.java#L51

so I tried with disabling this logic in a test setup.

diff --git a/satellite/src/main/java/com/linbit/linstor/api/protobuf/ListPhysicalDevices.java b/satellite/src/main/java/com/linbit/linstor/api/protobuf/ListPhysicalDevices.java
index ece1603f8..86a918fb3 100644
--- a/satellite/src/main/java/com/linbit/linstor/api/protobuf/ListPhysicalDevices.java
+++ b/satellite/src/main/java/com/linbit/linstor/api/protobuf/ListPhysicalDevices.java
@@ -59,7 +59,7 @@ public class ListPhysicalDevices implements ApiCall
                 MsgReqPhysicalDevicesOuterClass.MsgReqPhysicalDevices.parseDelimitedFrom(msgDataIn);

             List<LsBlkEntry> entries = LsBlkUtils.lsblk(extCmdFactory.create());
-            String[] blkIdEntries = LsBlkUtils.blkid(extCmdFactory.create());
+            String[] blkIdEntries = new String[]{}; // skip blkid command

             if (msgReqPhysicalDevices.getFilter())
             {

With this patch and using this command to wait for previous pvc to be 'Bound' state,

# for i in $(seq 1 50); do kubectl create -f /tmp/pvc$i.yaml; kubectl get --watch pvc demo-rwo-r$i -o 'go-template={{if eq .status.phase "Bound"}}{{.err}}{{end}}' | head -n 0; done

up to 50 parallel PVC creation worked fine :)

Best Regards,