# linstor r c m11c17 one-vm-8871-disk-0 -s thindata
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'one-vm-8871-disk-0' on node 'm11c17' registered.
Details:
    Resource 'one-vm-8871-disk-0' on node 'm11c17' UUID is: f0a45ef8-f80c-478d-b2d7-da01db587d33
SUCCESS:
Description:
    Volume with number '0' on resource 'one-vm-8871-disk-0' on node 'm11c17' successfully registered
Details:
    Volume UUID is: 8f3dbb2e-b164-4a5e-80b6-aef835530551
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm15c41'
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm13c41'
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm14c42'
SUCCESS:
    Created resource 'one-vm-8871-disk-0' on 'm11c17'
ERROR:
    Resource did not became ready on node 'm11c17' within reasonable time, check Satellite for errors.

# linstor r l -r one-vm-8871-disk-0
ERROR:
Show reports:
    linstor error-reports show 5F842882-00000-015931

linstor error-reports show 5F842882-00000-015931
ERROR REPORT 5F842882-00000-015931

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID:                           678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-10-03T22:33:15+00:00
Error time:                         2020-11-10 14:43:10
Node:                               linstor-controller-0

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       <UNKNOWN>

Call backtrace:

    Method                                   Native Class:Line number

END OF ERROR REPORT.

Linstor-controller log: controller.log

Can you please also attach error reports 5F842882-00000-015924 as well as 5F842882-00000-015925

I did not see such an error report often as you are showing here... quite strange..

However, just wondering - are you aware of these error reports?

18:35:54.935 [MainWorkerPool-1] ERROR LINSTOR/Controller - SYSTEM - Resource on node m11c11 has insufficient peer slots to add another peer [Report number 5F842882-00000-015917]
18:43:57.915 [MainWorkerPool-1] ERROR LINSTOR/Controller - SYSTEM - Resource on node m11c11 has insufficient peer slots to add another peer [Report number 5F842882-00000-015918]

If I am not mistaken, the default peer-count is 7.. do you have more resources in one resource-definition?

After the restart linstor-controller started listing the resources.

# linstor r l -r one-vm-8871-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8871-disk-0 ┊ m11c17 ┊ 55478 ┊ Unused ┊ StandAlone(m14c42) ┊ UpToDate ┊ 2020-11-10 14:38:08 ┊
┊ one-vm-8871-disk-0 ┊ m13c41 ┊ 55478 ┊ Unused ┊ Ok                 ┊ UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m14c42 ┊ 55478 ┊ Unused ┊ StandAlone(m11c17) ┊ UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m15c41 ┊ 55478 ┊ InUse  ┊ Ok                 ┊ Diskless ┊ 2020-11-09 06:35:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor r d m11c17 one-vm-8871-disk-0
SUCCESS:
Description:
    Node: m11c17, Resource: one-vm-8871-disk-0 marked for deletion.
Details:
    Node: m11c17, Resource: one-vm-8871-disk-0 UUID is: f0a45ef8-f80c-478d-b2d7-da01db587d33
SUCCESS:
    Notified 'm15c41' that 'one-vm-8871-disk-0' is being deleted on Node(s): [m11c17]
SUCCESS:
    Notified 'm14c42' that 'one-vm-8871-disk-0' is being deleted on Node(s): [m11c17]
SUCCESS:
    Deleted 'one-vm-8871-disk-0' on 'm11c17'
SUCCESS:
    Notified 'm13c41' that 'one-vm-8871-disk-0' is being deleted on Node(s): [m11c17]
SUCCESS:
Description:
    Node: m11c17, Resource: one-vm-8871-disk-0 deletion complete.
Details:
    Node: m11c17, Resource: one-vm-8871-disk-0 UUID was: f0a45ef8-f80c-478d-b2d7-da01db587d33

# linstor r c m11c17 one-vm-8871-disk-0 -s thindata
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'one-vm-8871-disk-0' on node 'm11c17' registered.
Details:
    Resource 'one-vm-8871-disk-0' on node 'm11c17' UUID is: 51971b9d-43eb-4c09-8153-9d7390146916
SUCCESS:
Description:
    Volume with number '0' on resource 'one-vm-8871-disk-0' on node 'm11c17' successfully registered
Details:
    Volume UUID is: 1723b233-db22-426f-8df6-9c8b1f699b52
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm15c41'
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm14c42'
SUCCESS:
    Added peer(s) 'm11c17' to resource 'one-vm-8871-disk-0' on 'm13c41'
SUCCESS:
    Created resource 'one-vm-8871-disk-0' on 'm11c17'
ERROR:
    Resource did not became ready on node 'm11c17' within reasonable time, check Satellite for errors.

# linstor r l -r one-vm-8871-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊             State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8871-disk-0 ┊ m11c17 ┊ 55478 ┊ Unused ┊ Connecting(m14c42) ┊ SyncTarget(2.14%) ┊ 2020-11-10 15:45:22 ┊
┊ one-vm-8871-disk-0 ┊ m13c41 ┊ 55478 ┊ Unused ┊ Ok                 ┊          UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m14c42 ┊ 55478 ┊ Unused ┊ StandAlone(m11c17) ┊          UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m15c41 ┊ 55478 ┊ InUse  ┊ Ok                 ┊          Diskless ┊ 2020-11-09 06:35:42 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Can you please also attach error reports 5F842882-00000-015924 as well as 5F842882-00000-015925

sure: ErrorReport-5F842882-00000-015917.log ErrorReport-5F842882-00000-015918.log

uhm... okay... Not sure how to say it, as I really don't want to be mean... but the first one leads to 404, and although the second one works, I asked for ..24 and ..25, not ..17 and ..18 :)

Thanks :)

So, after a brief research, here are my findings:

The weird NullPointerException with no stacktrace: When an Exception is throws too often, the JVM apparently is allowed to omit reconstructing the stacktrace for better performance since JDK 5. That means, I'd like to ask you to check your logs and try to find one of the first ErrorReports with a corresponding NullPointerException. That ErrorReport should contain the full stacktrace that might help us identifying the actual issue.

Your attempt to re-create the resource after restarting the satellite:


# linstor r c m11c17 one-vm-8871-disk-0 -s thindata
...
ERROR:
Resource did not became ready on node 'm11c17' within reasonable time, check Satellite for errors.

linstor r l -r one-vm-8871-disk-0

... one-vm-8871-disk-0 ┊ m11c17 ┊ 55478 ┊ Unused ┊ Connecting(m14c42) ┊ SyncTarget(2.14%) ┊ 2020-11-10 15:45:22



I am not fully sure about what happened here, but as the `r l` shows the resource is already syncing. Maybe a networking issue? Some timeout and Linstor just gave up waiting too early?

* The maybe unrelated ErrorReport 015918, regarding `Resource on node m11c11 has insufficient peer slots to add another peer`
Did you check if you have more than 7 resources? If you did not have more than 7, we migth need to dig deeper into this issue...

* ErrorReport ..24 shows: `Dependency not found` - `The storage pool 'DfltStorPool' for resource 'one-vm-8871-disk-0' for volume number '0' is not deployed on node 'm11c17'.`
I guess you forgot to add a `-s $myStoragePool` to the `r c` command?

* ErrorReport ..25: I have seen this bug, and I believe it is fixed with the recently released 1.10.0. The only time I saw this bug was duging a toggle-disk. 
My very wild guess: as there are ~5 seconds between ErrorReports ..24 and ..25 and if I assume that my previous guess is correct that you just forgot to add `-s $myStoragePool` to your `r c` command, I would higly assume that you also realized that mistake and fixed that quite quick... maybe within a few seconds? Now if that new `r c` was replacing a tiebreaker, Linstor automatically performs a toggle-disk (converting the diskless tiebreaker into the requested diskfull resource)
That toggle-disk could have triggered the bug I was talking about earlier.

Small update regarding the missing stacktrace (maybe someone finds this interesting) - by accident I also just ran into an endless loop of NullPointerExceptions and saw that the first 6148 ErrorReports did contain the full stacktrace, whereas the next showed no stacktraces just like in the original post of this issue.

I would not assume that this count is the same on every machine, could have many factors influencing the precise threshold...

Hi @ghernadi,

The weird NullPointerException with no stacktrace: When an Exception is throws too often, the JVM apparently is allowed to omit reconstructing the stacktrace for better performance since JDK 5. That means, I'd like to ask you to check your logs and try to find one of the first ErrorReports with a corresponding NullPointerException. That ErrorReport should contain the full stacktrace that might help us identifying the actual issue.

Well I can provide you all the controller reports collected for this session

linstor-controller.logs.tar.gz

I am not fully sure about what happened here, but as the r l shows the resource is already syncing. Maybe a networking issue? Some timeout and Linstor just gave up waiting too early?

The resource was syncing but didn't finish it correctly:

# linstor r l -r one-vm-8871-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8871-disk-0 ┊ m11c26 ┊ 55478 ┊ Unused ┊ Connecting(m14c42) ┊ UpToDate ┊ 2020-11-10 18:43:37 ┊
┊ one-vm-8871-disk-0 ┊ m13c41 ┊ 55478 ┊ Unused ┊ Ok                 ┊ UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m14c42 ┊ 55478 ┊ Unused ┊ StandAlone(m11c26) ┊ UpToDate ┊                     ┊
┊ one-vm-8871-disk-0 ┊ m15c41 ┊ 55478 ┊ InUse  ┊ Ok                 ┊ Diskless ┊ 2020-11-09 06:35:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor v l -r one-vm-8871-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊    State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m11c26 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 236.95 GiB ┊ Unused ┊ UpToDate ┊
┊ m13c41 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 250.05 GiB ┊ Unused ┊ UpToDate ┊
┊ m14c42 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 250.05 GiB ┊ Unused ┊ UpToDate ┊
┊ m15c41 ┊ one-vm-8871-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊            ┊ InUse  ┊ Diskless ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

I'm going investigate this, currently I collected the logs from the nodes:

drbd-res-files-and-dmesg.tar.gz

The maybe unrelated ErrorReport 015918, regarding Resource on node m11c11 has insufficient peer slots to add another peer Did you check if you have more than 7 resources? If you did not have more than 7, we migth need to dig deeper into this issue...

I guess this resource never had 7 replicas, not sure if this related but every week we are doing the flowing procedure for these resources:

First we creating snapshot
Then then we create new resource from this snapshot
Then removing the snapshot
Then Backuping the created resource
And after all remove the created resource

ErrorReport ..24 shows: Dependency not found - The storage pool 'DfltStorPool' for resource 'one-vm-8871-disk-0' for volume number '0' is not deployed on node 'm11c17'. I guess you forgot to add a -s $myStoragePool to the r c command?

Yes I usually forget to add -s $myStoragePool to CLI commands quite often

Now if that new r c was replacing a tiebreaker, Linstor automatically performs a toggle-disk (converting the diskless tiebreaker into the requested diskfull resource) That toggle-disk could have triggered the bug I was talking about earlier.

But we have tiebreaker turned off:

# linstor c lp | grep tie
┊ DrbdOptions/auto-add-quorum-tiebreaker    ┊ false                ┊

Hey,

controller error reports: Thanks for that. It took a while, but I believe now that those NullPointerExceptions are also based on ErrorReport-5F842882-00000-015925 (which I believe is already fixed in 1.10.0)
correctly syncing DRBD: If you are refering to the different "Allocated" sizes - you might want to check every lvm locally, Linstor should only forward what the corresponding lvs gives per satellite. The difference in allocation size might be just a question of triming the thin-lv(s)?
insufficient peer slots I am not fully sure why you need to create resources from snapshots instead of backup up the snapshots themself... Anyway, restoring a snapshot into a new resource also copies the node-id from the snapshot to the resource (as that has to match the value of the snapshotted internal metadata) That means, no new node-id's are allocated during restoring snapshots into new resources, which cannot lead to "insufficient peer slots" error. So there has to be another cause of these messages.
auto-tiebreaker: okay.. that means my assumption is wrong. Guess I have to think harder on how this could have been triggered :) Still - chances are that 1.10.0 fixes all of these issues.

Hi @ghernadi,

Thanks for that. It took a while, but I believe now that those NullPointerExceptions are also based on ErrorReport-5F842882-00000-015925 (which I believe is already fixed in 1.10.0)

OK, I'll prepare an upgrade in a while

If you are refering to the different "Allocated" sizes - you might want to check every lvm locally, Linstor should only forward what the corresponding lvs gives per satellite.

I always see less allocated data usage for the newly created replicas than for old ones, I think this is because of discard-zeroes-if-aligned yes which is enabled by default for the thinlvm drives, am I right?

I am not fully sure why you need to create resources from snapshots instead of backup up the snapshots themself...

Because snapshots are containing the drbd-metadata, I need to create resource from the snapshot to backup only raw data from it. Recovery is also done by full byte-copy on top of drbd-device.

As about standalone states, I fixed this by running drbdadm disconnect one-vm-8871-disk-0 && drbdadm connect one-vm-8871-disk-0 one by one on all storage-nodes:

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊    State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m11c26 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 236.95 GiB ┊ Unused ┊ UpToDate ┊
┊ m13c41 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 250.05 GiB ┊ Unused ┊ UpToDate ┊
┊ m14c42 ┊ one-vm-8871-disk-0 ┊ thindata             ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊ 250.05 GiB ┊ Unused ┊ UpToDate ┊
┊ m15c41 ┊ one-vm-8871-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    1629 ┊ /dev/drbd1629 ┊            ┊ InUse  ┊ Diskless ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

LINBIT / linstor-server

Can not list resources: java.lang.NullPointerException #195

linstor r l -r one-vm-8871-disk-0