LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Failed to access DRBD super-block of volume #56

Closed kvaps closed 2 years ago

kvaps commented 5 years ago

Just got this error, during restart linstor-satellite:

13:53:38.428 [DeviceManager] ERROR LINSTOR/Satellite - Failed to access DRBD super-block of volume one-vm-140-disk-2/0 [Report number 5CE7F723-16A3B-000000]
ERROR REPORT 5CE7F723-16A3B-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            0.9.9
Build ID:                           64b60e05c91b50097963c60f88830ce504dd4fd7
Build time:                         2019-05-23T07:18:37+00:00
Error time:                         2019-05-24 13:53:38
Node:                               m1c4

============================================================

Reported error:
===============

Description:
    Failed to access DRBD super-block of volume one-vm-140-disk-2/0

Category:                           LinStorException
Class name:                         VolumeException
Class canonical name:               com.linbit.linstor.storage.layer.exceptions.VolumeException
Generated at:                       Method 'hasMetaData', Source file 'DrbdLayer.java', Line #595

Error message:                      Failed to access DRBD super-block of volume one-vm-140-disk-2/0

Error context:
    An error occurred while processing resource 'Node: 'm1c4', Rsc: 'one-vm-140-disk-2''

Call backtrace:

    Method                                   Native Class:Line number
    hasMetaData                              N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:595
    adjustDrbd                               N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:391
    process                                  N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:233
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:655
    processResourcesAndTheirSnapshots        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:280
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:133
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:250
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:868
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:609
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:526
    run                                      N      java.lang.Thread:834

Caused by:
==========

Category:                           Exception
Class name:                         NoSuchFileException
Class canonical name:               java.nio.file.NoSuchFileException
Generated at:                       Method 'translateToIOException', Source file 'UnixException.java', Line #92

Error message:                      /dev/data/one-vm-140-disk-2_00000

Call backtrace:

    Method                                   Native Class:Line number
    translateToIOException                   N      sun.nio.fs.UnixException:92
    rethrowAsIOException                     N      sun.nio.fs.UnixException:111
    rethrowAsIOException                     N      sun.nio.fs.UnixException:116
    newFileChannel                           N      sun.nio.fs.UnixFileSystemProvider:178
    open                                     N      java.nio.channels.FileChannel:292
    open                                     N      java.nio.channels.FileChannel:345
    readObject                               N      com.linbit.linstor.storage.layer.adapter.drbd.utils.MdSuperblockBuffer:74
    hasMetaData                              N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:590
    adjustDrbd                               N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:391
    process                                  N      com.linbit.linstor.storage.layer.adapter.drbd.DrbdLayer:233
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:655
    processResourcesAndTheirSnapshots        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:280
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:133
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:250
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:868
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:609
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:526
    run                                      N      java.lang.Thread:834

END OF ERROR REPORT.

resources are marked as Unknown and does not spawning on the node

previously this resource was created from the snapshot one-image-42 and have it's own snapshot:

# linstor s l | grep one-vm-140-disk-2
| one-image-42      | one-vm-140-disk-2 | m1c4, m1c6 | 0: 3 GiB | Successful |
| one-vm-140-disk-2 | snapshot-0        | m1c4, m1c6 | 0: 3 GiB | Successful |
# linstor r l -r one-vm-140-disk-2
+----------------------------------------------------+
| ResourceName      | Node | Port | Usage  |   State |
|----------------------------------------------------|
| one-vm-140-disk-2 | m1c4 | 7076 | Unused | Unknown |
| one-vm-140-disk-2 | m1c6 | 7076 | Unused | Unknown |
+----------------------------------------------------+

so chain is looking like:

one-image-42 <-- one-vm-140-disk-2 <-- snapshot-0

UPD: If I create /var/lib/linstor.d/one-vm-140-disk-2.res and describe resource manually, then run drbdadm up one-vm-140-disk-2 , than everything started working fine, but still marked as Unknown in the controller.

ghernadi commented 5 years ago

Does the LV exist? lvs data | grep one-vm-140-disk-2_00000

kvaps commented 5 years ago

Does the LV exist?

Yes it was. As I said, I've described resource manually, then run drbdadm up on both nodes and it is started working fine.

Can't it be connected with https://github.com/LINBIT/linstor-server/issues/58?

rp- commented 2 years ago

Also closing this issue, as it is old and we never had any other reports of a similar problem again.