LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
988 stars 76 forks source link

LINSTOR sets wrong verify-alg #302

Open kvaps opened 2 years ago

kvaps commented 2 years ago

reported by @duckhawk

A new node just added to existing cluster, the creation of resources is not working on it:

| node0   | pvc-26a48223-0458-407f-bb70-ca01658cb6b2 | DfltDisklessStorPool |     0 |    1011 | None          |            |        |  Unknown |
| node1   | pvc-26a48223-0458-407f-bb70-ca01658cb6b2 | store                |     0 |    1011 | /dev/drbd1011 | 112.45 MiB | Unused | UpToDate |
| node2   | pvc-26a48223-0458-407f-bb70-ca01658cb6b2 | store                |     0 |    1011 | /dev/drbd1011 |  16.42 MiB | InUse  | UpToDate |
| node3   | pvc-26a48223-0458-407f-bb70-ca01658cb6b2 | DfltDisklessStorPool |     0 |    1011 | None          |            | Unused | Diskless |
02:18:04.458 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-26a48223-0458-407f-bb70-ca01658cb6b2' created for node 'node0'.
02:18:04.458 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-26a48223-0458-407f-bb70-ca01658cb6b2' updated for node 'node1'.
02:18:04.458 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-26a48223-0458-407f-bb70-ca01658cb6b2' updated for node 'node2'.
02:18:04.458 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-26a48223-0458-407f-bb70-ca01658cb6b2' updated for node 'node3'.
02:18:04.548 [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - Failed to adjust DRBD resource pvc-26a48223-0458-407f-bb70-ca01658cb6b2 [Report number 62E54F1D-73CF5-000029]
root@master0:~# linstor error-reports show 62E54F1D-73CF5-000029
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
ERROR REPORT 62E54F1D-73CF5-000029

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.18.2
Build ID:                           26945460e48d2b9e98f6e2163e05b722dd5ff3ca
Build time:                         2022-07-01T14:14:49+00:00
Error time:                         2022-07-31 02:18:04
Node:                               node3

============================================================

Reported error:
===============

Description:
    Failed to adjust DRBD resource pvc-26a48223-0458-407f-bb70-ca01658cb6b2

Category:                           LinStorException
Class name:                         ResourceException
Class canonical name:               com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at:                       Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #819

Error message:                      Failed to adjust DRBD resource pvc-26a48223-0458-407f-bb70-ca01658cb6b2

Error context:
    An error occurred while processing resource 'Node: 'node3', Rsc: 'pvc-26a48223-0458-407f-bb70-ca01658cb6b2''

Call backtrace:

    Method                                   Native Class:Line number
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:819
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:393
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 1.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm -vvv adjust pvc-26a48223-0458-407f-bb70-ca01658cb6b2

    The external command sent the following output data:
    drbdsetup new-peer pvc-26a48223-0458-407f-bb70-ca01658cb6b2 1 --_name=node1 --verify-alg=crct10dif-pclmul --shared-secret=IrEBR3108jffGYPqn8dx --cram-hmac-alg=sha1

    The external command sent the following error information:
    pvc-26a48223-0458-407f-bb70-ca01658cb6b2: Failure: (146) VERIFYAlgNotAvail
    additional info from kernel:
    failed to allocate crct10dif-pclmul for verify

    Command 'drbdsetup new-peer pvc-26a48223-0458-407f-bb70-ca01658cb6b2 1 --_name=node1 --verify-alg=crct10dif-pclmul --shared-secret=IrEBR3108jffGYPqn8dx --cram-hmac-alg=sha1' terminated with exit code 10
    drbdadm: new-peer pvc-26a48223-0458-407f-bb70-ca01658cb6b2: skipped due to earlier error

Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #593

Error message:                      The external command 'drbdadm' exited with error code 1

Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:593
    adjust                                   N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:90
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:741
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:393
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

error in dmesg:

root@node3:~# dmesg | grep crct10dif
[143447.618539] drbd pvc-43f83e36-ea89-466b-86d3-98598ffb5256/0 drbd1023: Different verify-alg settings. me="crc32c" peer="crct10dif-pclmul"
root@node3:~# 

there is a difference in kernel:

old nodes:

root@node0:~# uname -a
Linux node0 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@node0:~# cat /proc/drbd 
version: 9.1.7 (api:2/proto:110-121)
GIT-hash: bfd2450739e3e27cfd0a2eece2cde3d94ad993ae build by @node0, 2022-07-30 10:04:45
Transports (api:18): tcp (9.1.7)

new node:

root@node3:~# uname -a
Linux node3 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@node3:~# cat /proc/drbd
version: 9.1.7 (api:2/proto:110-121)
GIT-hash: bfd2450739e3e27cfd0a2eece2cde3d94ad993ae build by @node3, 2022-07-30 14:52:09
Transports (api:18): tcp (9.1.7)
kvaps commented 2 years ago

workaround is manually specify verify-alg for resource group:

linstor resource-group drbd-options --verify-alg crc32c my_verify_group
rp- commented 2 years ago

could you provide /proc/crypto from all nodes involved?

duckhawk commented 2 years ago

new_node.txt old_node.txt old node - here all working ok new node - here is the problem