LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
976 stars 76 forks source link

LINSTOR tries to remove device from LVM without removing it from luks #350

Open kvaps opened 1 year ago

kvaps commented 1 year ago
root@b-hv-3:/# linstor r l -r pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node   ┊ Port ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 ┊ b-hv-2 ┊ 7023 ┊ Unused ┊ Connecting(b-hv-3) ┊ UpToDate ┊ 2023-03-09 08:36:23 ┊
┊ pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 ┊ b-hv-3 ┊ 7023 ┊        ┊                    ┊  Unknown ┊ 2023-04-11 15:47:30 ┊
┊ pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 ┊ c-hv-5 ┊ 7023 ┊ InUse  ┊ Connecting(b-hv-3) ┊ UpToDate ┊ 2023-04-11 15:35:00 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@b-hv-3:/# linstor r d b-hv-3 pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5
SUCCESS:
Description:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 preparing for deletion.
Details:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 UUID is: 52b74ce6-b94a-43a1-bf38-38de3ccf5c37
SUCCESS:
    Preparing deletion of resource on 'c-hv-5'
SUCCESS:
    Preparing deletion of resource on 'b-hv-2'
ERROR:
Description:
    (Node: 'b-hv-3') Failed to delete lvm volume
Details:
    Command 'lvremove --config devices { filter=['a|/dev/nvme0n1p5|','a|/dev/nvme1n1p5|','r|.*|'] } -f vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000' returned with exitcode 5.

    Standard out:

    Error message:
      Logical volume vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000 is used by another device.

Show reports:
    linstor error-reports show 6437F21B-9841A-000072
ERROR:
Description:
    Deletion of resource 'pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5' on node 'b-hv-3' failed due to an unknown exception.
Details:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5
Show reports:
    linstor error-reports show 6437F5F0-00000-000039
root@b-hv-3:/# cryptsetup remove Linstor-Crypt-pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000
root@b-hv-3:/# linstor r d b-hv-3 pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5
SUCCESS:
Description:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 preparing for deletion.
Details:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5 UUID is: 52b74ce6-b94a-43a1-bf38-38de3ccf5c37
SUCCESS:
    Preparing deletion of resource on 'b-hv-2'
SUCCESS:
    Preparing deletion of resource on 'c-hv-5'
ERROR:
Description:
    (Node: 'b-hv-3') LuksFormat failed
Details:
    Command 'cryptsetup -q luksFormat --pbkdf-memory 262144 /dev/vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000' returned with exitcode 4.

    Standard out:

    Error message:
    Device /dev/vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000 does not exist or access denied.

Show reports:
    linstor error-reports show 6437F21B-9841A-000097
ERROR:
Description:
    Deletion of resource 'pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5' on node 'b-hv-3' failed due to an unknown exception.
Details:
    Node: b-hv-3, Resource: pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5
Show reports:
    linstor error-reports show 6437F5F0-00000-000041
ERROR REPORT 6437F21B-9841A-000072

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.21.0
Build ID:                           b44bb8d41f264ac1089d9a0a1c540d3cc703d7e8
Build time:                         2023-04-04T10:11:03+00:00
Error time:                         2023-04-13 14:01:08
Node:                               b-hv-3

============================================================

Reported error:
===============

Description:
    Failed to delete lvm volume
Additional information:
    Command 'lvremove --config devices { filter=['a|/dev/nvme0n1p5|','a|/dev/nvme1n1p5|','r|.*|'] } -f vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000' returned with exitcode 5.

    Standard out:

    Error message:
      Logical volume vg/pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5_00000 is used by another device.

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69

Error message:                      Failed to delete lvm volume

Error context:
    An error occurred while processing resource 'Node: 'b-hv-3', Rsc: 'pvc-5bb88aa7-a018-4428-87c3-afb05e2cb5d5''

Call backtrace:

    Method                                   Native Class:Line number
    checkExitCode                            N      com.linbit.extproc.ExtCmdUtils:69
    genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:101
    genericExecutor                          N      com.linbit.linstor.layer.storage.utils.Commands:61
    delete                                   N      com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:198
    lambda$deleteLvImpl$2                    N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:173
    execWithRetry                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:499
    deleteLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:170
    deleteLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:48
    deleteVolumes                            N      com.linbit.linstor.layer.storage.AbsStorageProvider:778
    process                                  N      com.linbit.linstor.layer.storage.AbsStorageProvider:403
    process                                  N      com.linbit.linstor.layer.storage.StorageLayer:313
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:922
    process                                  N      com.linbit.linstor.layer.luks.LuksLayer:273
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:922
    processChild                             N      com.linbit.linstor.layer.drbd.DrbdLayer:487
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:417
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:922
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:379
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:189
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:322
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1152
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:750
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:644
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
kvaps commented 1 year ago

I tried to remove crypt device and lv manually, now I can't remove resource at all:

root@b-hv-3:/# linstor r d b-hv-3 pvc-1dd492b4-9a89-4581-b20f-178dce17200b
SUCCESS:
Description:
    Node: b-hv-3, Resource: pvc-1dd492b4-9a89-4581-b20f-178dce17200b preparing for deletion.
Details:
    Node: b-hv-3, Resource: pvc-1dd492b4-9a89-4581-b20f-178dce17200b UUID is: 55c99c65-2c7e-435b-ba08-f1a6b68c345f
SUCCESS:
    Preparing deletion of resource on 'madison-db-1'
ERROR:
    (Node: 'b-hv-3') An unknown exception occurred while processing the resource pvc-1dd492b4-9a89-4581-b20f-178dce17200b
Show reports:
    linstor error-reports show 64380C9B-9841A-000030
SUCCESS:
    Preparing deletion of resource on 'a-hv-1'
ERROR:
Description:
    Deletion of resource 'pvc-1dd492b4-9a89-4581-b20f-178dce17200b' on node 'b-hv-3' failed due to an unknown exception.
Details:
    Node: b-hv-3, Resource: pvc-1dd492b4-9a89-4581-b20f-178dce17200b
Show reports:
    linstor error-reports show 6437F5F0-00000-000060
root@b-hv-3:/# linstor error-reports show 64380C9B-9841A-000030
ERROR REPORT 64380C9B-9841A-000030

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.21.0
Build ID:                           b44bb8d41f264ac1089d9a0a1c540d3cc703d7e8
Build time:                         2023-04-04T10:11:03+00:00
Error time:                         2023-04-13 14:51:01
Node:                               b-hv-3

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'start', Source file 'ProcessBuilder.java', Line #1090

Error context:
    An error occurred while processing resource 'Node: 'b-hv-3', Rsc: 'pvc-1dd492b4-9a89-4581-b20f-178dce17200b''

Call backtrace:

    Method                                   Native Class:Line number
    start                                    N      java.lang.ProcessBuilder:1090
    start                                    N      java.lang.ProcessBuilder:1071
    exec                                     N      com.linbit.extproc.ExtCmd:128
    exec                                     N      com.linbit.extproc.ExtCmd:89
    hasLuksFormat                            N      com.linbit.linstor.layer.luks.CryptSetupCommands:236
    process                                  N      com.linbit.linstor.layer.luks.LuksLayer:291
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:922
    processChild                             N      com.linbit.linstor.layer.drbd.DrbdLayer:487
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:417
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:922
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:379
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:189
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:322
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1152
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:750
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:644
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

root@b-hv-3:/# linstor error-reports show 6437F5F0-00000-000060
ERROR REPORT 6437F5F0-00000-000060

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.21.0
Build ID:                           b44bb8d41f264ac1089d9a0a1c540d3cc703d7e8
Build time:                         2023-04-04T10:11:03+00:00
Error time:                         2023-04-13 14:51:01
Node:                               linstor-controller-6464f8977c-dq7cc
Peer:                               RestClient(10.111.2.74; 'PythonLinstor/1.17.0 (API1.0.4): Client 1.17.0')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         DelayedApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils.DelayedApiRcException
Generated at:                       Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126

Error message:                      Exceptions have been converted to responses

Error context:
    Deletion of resource 'pvc-1dd492b4-9a89-4581-b20f-178dce17200b' on node 'b-hv-3' failed due to an unknown exception.

Asynchronous stage backtrace:
    (Node: 'b-hv-3') An unknown exception occurred while processing the resource pvc-1dd492b4-9a89-4581-b20f-178dce17200b

    Error has been observed at the following site(s):
        |_ checkpoint ? Prepare resource delete
        |_ checkpoint ? Activating resource if necessary before deletion
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    lambda$mergeExtractingApiRcExceptions$4  N      com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126

Suppressed exception 1 of 2:
===============
Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:                      (Node: 'b-hv-3') An unknown exception occurred while processing the resource pvc-1dd492b4-9a89-4581-b20f-178dce17200b

Error context:
    Deletion of resource 'pvc-1dd492b4-9a89-4581-b20f-178dce17200b' on node 'b-hv-3' failed due to an unknown exception.

ApiRcException entries:
Nr: 1
  Message: (Node: 'b-hv-3') An unknown exception occurred while processing the resource pvc-1dd492b4-9a89-4581-b20f-178dce17200b

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:284
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedSink:248
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:168
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:388
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829

Suppressed exception 2 of 2:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126

Error message:
Error has been observed at the following site(s):
    |_ checkpoint ��� Prepare resource delete
    |_ checkpoint ��� Activating resource if necessary before deletion
Stack trace:

Error context:
    Deletion of resource 'pvc-1dd492b4-9a89-4581-b20f-178dce17200b' on node 'b-hv-3' failed due to an unknown exception.

Call backtrace:

    Method                                   Native Class:Line number
    lambda$mergeExtractingApiRcExceptions$4  N      com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    checkTerminated                          N      reactor.core.publisher.FluxFlatMap$FlatMapMain:838
    drainLoop                                N      reactor.core.publisher.FluxFlatMap$FlatMapMain:600
    innerComplete                            N      reactor.core.publisher.FluxFlatMap$FlatMapMain:909
    onComplete                               N      reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
    onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:191
    onComplete                               N      reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber:81
    onComplete                               N      reactor.core.publisher.FluxPeek$PeekSubscriber:252
    onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N      reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
    complete                                 N      reactor.core.publisher.FluxCreate$BaseSink:438
    drain                                    N      reactor.core.publisher.FluxCreate$BufferAsyncSink:784
    complete                                 N      reactor.core.publisher.FluxCreate$BufferAsyncSink:732
    drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedSink:239
    drain                                    N      reactor.core.publisher.FluxCreate$SerializedSink:205
    complete                                 N      reactor.core.publisher.FluxCreate$SerializedSink:196
    apiCallComplete                          N      com.linbit.linstor.netcom.TcpConnectorPeer:470
    handleComplete                           N      com.linbit.linstor.proto.CommonMessageProcessor:363
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:287
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedSink:248
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:168
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:388
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
ghernadi commented 1 year ago

Try restarting that satellite. My bet is that Linstor still thinks that the luks (maybe also the lvm) volume still exist, and tries to access them (which should not have happened if linstor would have been the one deleting those volumes properly).

Guess there is missing an "exists" check in the LuksLayer, I will look into that. For now, restarting the satellite (or a simple reconnect should also be enough) should trigger a re-scan of those volumes, which could fix your issue.

kvaps commented 1 year ago

Finaly I wasn't able to remove there resources even after restart of satellites. I removed them from db as part of https://github.com/LINBIT/linstor-server/issues/348#issuecomment-1507646983