Resource state stuck at "Negotiating" after toggle-disk command

pavanfhw commented 3 years ago

I ran the following commands and the resources got stuck at Negotiating. The volumes seems to be working, because the pods are responding normally.

$ linstor resource list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-master  ┊ 7001 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2021-06-11 19:16:42 ┊
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-worker1 ┊ 7001 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2021-06-11 19:16:32 ┊
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-worker2 ┊ 7001 ┊ InUse  ┊ Ok    ┊ Diskless ┊ 2021-06-11 20:06:41 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-master  ┊ 7000 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2021-06-11 19:16:18 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-worker1 ┊ 7000 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2021-06-11 19:16:13 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-worker2 ┊ 7000 ┊ InUse  ┊ Ok    ┊ Diskless ┊ 2021-06-11 20:06:47 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ linstor r td local-worker2 pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d
SUCCESS:
    Addition of disk to resource 'pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d' on node 'local-worker2' registered
SUCCESS:
    Prepared 'local-master' to expect disk on 'local-worker2'
SUCCESS:
    Prepared 'local-worker1' to expect disk on 'local-worker2'
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'off' to 'majority' by auto-quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'off' to 'io-error' by auto-quorum
SUCCESS:
    Added disk on 'local-worker2'
$ linstor r td local-worker2 pvc-40962d7a-6610-4101-95ca-d25c3a28ec22
SUCCESS:
    Addition of disk to resource 'pvc-40962d7a-6610-4101-95ca-d25c3a28ec22' on node 'local-worker2' registered
SUCCESS:
    Prepared 'local-master' to expect disk on 'local-worker2'
SUCCESS:
    Prepared 'local-worker1' to expect disk on 'local-worker2'
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'off' to 'majority' by auto-quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'off' to 'io-error' by auto-quorum
SUCCESS:
    Added disk on 'local-worker2'

$ linstor resource list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊       State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-master  ┊ 7001 ┊ Unused ┊ Ok    ┊    Diskless ┊ 2021-06-11 19:16:42 ┊
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-worker1 ┊ 7001 ┊ Unused ┊ Ok    ┊    UpToDate ┊ 2021-06-11 19:16:32 ┊
┊ pvc-4c533ffc-4b7a-48e1-89ed-ab2d7e5d488d ┊ local-worker2 ┊ 7001 ┊ InUse  ┊ Ok    ┊ Negotiating ┊ 2021-06-11 20:06:41 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-master  ┊ 7000 ┊ Unused ┊ Ok    ┊    Diskless ┊ 2021-06-11 19:16:18 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-worker1 ┊ 7000 ┊ Unused ┊ Ok    ┊    UpToDate ┊ 2021-06-11 19:16:13 ┊
┊ pvc-40962d7a-6610-4101-95ca-d25c3a28ec22 ┊ local-worker2 ┊ 7000 ┊ InUse  ┊ Ok    ┊ Negotiating ┊ 2021-06-11 20:06:47 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Additionally, these diskless replicas were created automatically by the linstor after a node failure. Is there a way to configure linstor to create disk replicas instead of diskless in this case? I think it is important to recreate an on disk replica to replace the replica that was lost on the lost node

ghernadi commented 3 years ago

the resources got stuck at Negotiating.

Can you show us some dmesg from local-worker2, especially regarding DRBD and those two resources? Negotiating comes from DRBD. Linstor is only displaying the DRBD state here.

Is there a way to configure linstor to create disk replicas instead of diskless in this case?

Can you show your resource-group list or at least the configuration of the one resource-group you are using? Also some controller logs would be interesting, as by default the replica-count is 2, which should mean that Linstor should have at least tried to create the second diskful replica when evicting..

What happened after node failure? I assume you are using k8s - did k8s simply recreate the pod somewhere else? Was Linstor's auto-evict even involved here? (i.e. was a node in EVICTED state at some point where you had to use the node restore command?)

pavanfhw commented 3 years ago

Yes, I am using k8s. To give more context, these are my pods right now:

NAME                                         READY   STATUS             RESTARTS   AGE     IP                NODE            NOMINATED NODE   READINESS GATES
snapshot-controller-5f56d9b66b-ldmw6         1/1     Running            0          3d16h   10.42.1.213       local-master    <none>           <none>
piraeus-op-stork-6c8457647b-ntbd6            1/1     Running            0          3d16h   10.42.1.215       local-master    <none>           <none>
piraeus-op-operator-db88b55c5-hk4bf          1/1     Running            0          3d16h   10.42.0.164       local-worker1   <none>           <none>
piraeus-op-ns-node-bngjj                     2/2     Running            0          3d16h   192.168.122.136   local-worker1   <none>           <none>
piraeus-op-etcd-0                            1/1     Running            0          3d16h   10.42.1.216       local-master    <none>           <none>
piraeus-op-csi-controller-bc6f7c957-xxspx    6/6     Running            0          3d16h   10.42.1.214       local-master    <none>           <none>
piraeus-op-ns-node-559ql                     2/2     Running            0          3d16h   192.168.122.200   local-master    <none>           <none>
piraeus-op-operator-db88b55c5-htrk8          1/1     Running            0          3d16h   10.42.1.220       local-master    <none>           <none>
piraeus-op-etcd-1                            1/1     Running            0          3d16h   10.42.6.4         local-worker2   <none>           <none>
piraeus-op-etcd-2                            1/1     Running            0          3d16h   10.42.0.167       local-worker1   <none>           <none>
piraeus-op-cs-controller-5574668665-bnj74    1/1     Running            1          3d16h   10.42.1.218       local-master    <none>           <none>
piraeus-op-cs-controller-5574668665-7gwck    1/1     Running            5          3d16h   10.42.0.165       local-worker1   <none>           <none>
piraeus-op-csi-node-bbd9p                    3/3     Running            2          3d16h   10.42.6.2         local-worker2   <none>           <none>
piraeus-op-csi-node-z2gdg                    3/3     Running            7          3d16h   10.42.0.163       local-worker1   <none>           <none>
piraeus-op-csi-node-x9blh                    3/3     Running            7          3d16h   10.42.1.217       local-master    <none>           <none>
piraeus-op-ha-controller-8464557d7c-s8cr9    1/1     Running            5          3d16h   10.42.1.219       local-master    <none>           <none>
piraeus-op-ns-node-42rt5                     2/2     Running            0          3d16h   192.168.122.147   local-worker2   <none>           <none>
piraeus-op-ha-controller-8464557d7c-vm997    1/1     Running            8          3d16h   10.42.0.166       local-worker1   <none>           <none>

I'm using etcd with 3 replicas and 2 replicas for the other deployments for high availability. When the node local-worker2 failed, only etcd could not be recreated, because of k8s affinity. After the timeout I configured, the node (local-worker2) was auto-evicted, and a diskless replica was created to replace the lost one. This is my resource-group:

# linstor resource-group list
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceGroup                           ┊ SelectFilter                    ┊ VlmNrs ┊ Description ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp                              ┊ PlaceCount: 2                   ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ sc-d1e731da-8b55-52d4-a60d-94b7bffbcbd1 ┊ PlaceCount: 2                   ┊ 0      ┊             ┊
┊                                         ┊ StoragePool(s): lvm-pool        ┊        ┊             ┊
┊                                         ┊ LayerStack: ['DRBD', 'STORAGE'] ┊        ┊             ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

This config should make linstor maintain 2 diskful replicas?

After the test I readded the node to the cluster and it went ok. But another diskless replica was created (first linstor resource list output from the previous comment). And the problem happened when running toggle-disk

Can you show us some dmesg from local-worker2

What is the command for that, in cs-controller pod I assume?

Also some controller logs would be interesting

There is nothing interesting, just this INFO logs in loop:

00:07:00.546 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Satellite local-worker1 reports a capacity of 41934848 kiB, no errors
00:07:00.732 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Satellite local-master reports a capacity of 20967424 kiB, no errors
00:07:01.771 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Satellite local-worker2 reports a capacity of 41934848 kiB, no errors
00:07:01.868 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
02:07:01.932 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
04:07:01.935 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
06:07:01.939 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
08:07:01.943 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
10:07:01.948 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB
12:07:01.950 [SpaceTrackingService] INFO  LINSTOR/Controller - SYSTEM - SpaceTracking: Aggregate capacity is 104837120 kiB

LINBIT / linstor-server

Resource state stuck at "Negotiating" after toggle-disk command #238