Open dimm0 opened 2 years ago
Resize also fails
ERROR REPORT 6361DB5E-BCA58-000001
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-10-18T07:19:30+00:00
Error time: 2022-11-02 03:07:23
Node: hcc-nrp-shor-c6005.unl.edu
============================================================
Reported error:
===============
Description:
Failed to adjust DRBD resource pvc-3d40ff9a-ec83-4985-9d77-bc073256ad15
Category: LinStorException
Class name: ResourceException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at: Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #819
Error message: Failed to adjust DRBD resource pvc-3d40ff9a-ec83-4985-9d77-bc073256ad15
Error context:
An error occurred while processing resource 'Node: 'hcc-nrp-shor-c6005.unl.edu', Rsc: 'pvc-3d40ff9a-ec83-4985-9d77-bc073256ad15''
Call backtrace:
Method Native Class:Line number
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:819
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Description:
Execution of the external command 'drbdadm' failed.
Cause:
The external command did not complete within the timeout.
Possible causes include:
- The system load may be too high to ensure completion of external commands in a timely manner.
- The program implementing the external command may not be operating properly.
- The operating system may have entered an erroneous state.
Correction:
Check whether the external program and the operating system are still operating properly.
Check whether the system's load is within normal parameters.
Additional information:
The full command line executed was:
drbdadm -vvv resize pvc-3d40ff9a-ec83-4985-9d77-bc073256ad15/0
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #598
Error message: The external command 'drbdadm' did not complete within the timeout
Call backtrace:
Method Native Class:Line number
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:598
resize N com.linbit.linstor.layer.drbd.utils.DrbdAdm:122
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:644
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Category: Exception
Class name: ChildProcessTimeoutException
Class canonical name: com.linbit.ChildProcessTimeoutException
Generated at: Method 'waitFor', Source file 'ChildProcessHandler.java', Line #133
Call backtrace:
Method Native Class:Line number
waitFor N com.linbit.extproc.ChildProcessHandler:133
syncProcess N com.linbit.extproc.ExtCmd:156
pipeExec N com.linbit.extproc.ExtCmd:104
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:590
resize N com.linbit.linstor.layer.drbd.utils.DrbdAdm:122
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:644
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
END OF ERROR REPORT.
Anybody?
Up
I’m seeing the DRBD having a too short timeout for drbdadm (provision volume) and mkfs operations, resulting in it being unable to create large volumes or format those. It keeps retrying, but leaving the broken volume every time.
I first tried running it on a 1PB zfs node and provision several volumes from 1PB to 100TB, all failed. Then on mdraid node it was unable to do mkfs.xfs a 50TB volume, which is taking more than a minute to complete. Smaller (50GB) volumes are working fine.
Here’s the error: