LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

LINSTOR resizing operation is stuck #329

Open kvaps opened 1 year ago

kvaps commented 1 year ago

Hello, I just faced with the case when linstor successfully set a new size to the volume definition, but nor drbd device and backing LV weren't resized.

Restart of linstor-sattelite didn't change anything. Restart of linstor-controller resized the device correctly

linstor 1.19.1 the resource is created in single replica

ghernadi commented 1 year ago

ErrorReports or anything suspicious in the logs (controller/satellite)?

kvaps commented 1 year ago

Unfortunately logs are clean, after switching logs to TRACE everything works as it should 😕

kvaps commented 1 year ago

Guess this is duplicate of https://github.com/LINBIT/linstor-server/issues/326

kvaps commented 1 year ago

I was wrong, this isn't duplicate, this is different issue

Today I've got the same situation. Volume definition was resized, but DRBD and backing LV is not:

# kubectl get pvc -n sentry data-postgres-0
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
data-postgres-0   Bound    pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c   56Gi       RWO            linstor-fast-r2   61d

# kubectl get pv pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS      REASON   AGE
pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c   56Gi       RWO            Delete           Bound    sentry/data-postgres-0   linstor-fast-r2            61d

# linstor vd l -r pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c
╭────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ VolumeNr ┊ VolumeMinor ┊ Size   ┊ Gross ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c ┊ 0        ┊ 1010        ┊ 56 GiB ┊       ┊ ok    ┊
╰────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor v l -r pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node         ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ kube-node-12 ┊ pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c ┊ DfltDisklessStorPool ┊     0 ┊    1010 ┊ /dev/drbd1010 ┊           ┊ Unused ┊ TieBreaker ┊
┊ kube-node-5  ┊ pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c ┊ fast                 ┊     0 ┊    1010 ┊ /dev/drbd1010 ┊ 50.01 GiB ┊ Unused ┊   UpToDate ┊
┊ kube-node-7  ┊ pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c ┊ fast                 ┊     0 ┊    1010 ┊ /dev/drbd1010 ┊ 50.01 GiB ┊ InUse  ┊   UpToDate ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@kube-node-7:/# lvs | grep pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c
  pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c_00000 fast_storage -wi-ao---- 50.01g

root@kube-node-7:/# fdisk -l /dev/drbd1010
Disk /dev/drbd1010: 50 GiB, 53688164352 bytes, 104859696 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Controller logs have just single error:

LINSTOR, Module Controller
Version:            1.20.0 (9c6f7fad48521899f7a99c564b1d33aeacfdbfa8)
Build time:         2022-11-07T16:37:38+00:00
Java Version:       11
Java VM:            Ubuntu, Version 11.0.16+8-post-Ubuntu-0ubuntu122.04
Operating system:   Linux, Version 5.3.0-51-generic
Environment:        amd64, 1 processors, 7737 MiB memory reserved for allocations

System components initialization in progress

Loading configuration file "/etc/linstor/linstor.toml"
16:53:16.518 [main] INFO  LINSTOR/Controller - SYSTEM - ErrorReporter DB first time init.
16:53:16.519 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
16:53:16.547 [main] INFO  LINSTOR/Controller - SYSTEM - Database type is Kubernetes-CRD
16:53:16.547 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.
16:53:16.829 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 282ms
16:53:16.829 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.
16:53:16.841 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
16:53:16.841 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
16:53:16.841 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
16:53:16.846 [Main] INFO  LINSTOR/Controller - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
16:53:16.846 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
16:53:16.847 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" is not installed
16:53:17.436 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 607ms
16:53:17.436 [Main] INFO  LINSTOR/Controller - SYSTEM - Cryptography provider: Using default cryptography module
16:53:17.608 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
16:53:17.826 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
16:53:17.826 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the k8s crd database connector
16:53:17.826 [Main] INFO  LINSTOR/Controller - SYSTEM - Kubernetes-CRD connection URL is "k8s"
16:53:18.738 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
16:53:18.744 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading security objects
16:53:18.830 [Main] INFO  LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
16:53:19.103 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
16:53:19.501 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database completed
16:53:19.799 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TaskScheduleService' of type TaskScheduleService
16:53:19.801 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing network communications services
16:53:19.987 [Main] WARN  LINSTOR/Controller - SYSTEM - NetComService: Connector DebugSslConnector: Binding the socket to the IPv6 anylocal address failed, attempting fallback to IPv4
16:53:19.987 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'DebugSslConnector', bound to ::0:3373
16:53:19.988 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'PlainConnector'
16:53:20.098 [Main] INFO  LINSTOR/Controller - SYSTEM - Created network communication service 'SslConnector'
16:53:20.098 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnecting to previously known nodes
16:53:20.135 [Main] INFO  LINSTOR/Controller - SYSTEM - Reconnect requests sent
16:53:20.136 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'ScheduleBackupService' of type ScheduleBackupService
16:53:20.137 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'EbsStatusPoll' of type EbsStatusPoll
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.sun.xml.bind.v2.runtime.reflect.opt.Injector (file:/usr/share/linstor-server/lib/jaxb-impl-2.2.11.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of com.sun.xml.bind.v2.runtime.reflect.opt.Injector
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Dec 28, 2022 4:53:20 PM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: A resource, Resource{"v1/action", 0 child resources, 0 resource methods, 0 sub-resource locator, 0 method handler classes, 0 method handler instances}, with path "v1/action" is empty. It has no resource (or sub resource) methods neither sub resource locators defined.

Dec 28, 2022 4:53:20 PM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: A resource, Resource{"v1/action", 0 child resources, 0 resource methods, 0 sub-resource locator, 0 method handler classes, 0 method handler instances}, with path "v1/action" is empty. It has no resource (or sub resource) methods neither sub resource locators defined.

Dec 28, 2022 4:53:21 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [127.0.0.1:3370]
Dec 28, 2022 4:53:21 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Dec 28, 2022 4:53:21 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:3371]
Dec 28, 2022 4:53:21 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
16:53:21.016 [Main] WARN  LINSTOR/Controller - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null
16:53:21.016 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized

16:53:21.068 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:21.069 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:50:55.843 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small! [Report number 63AC747C-00000-000000]

16:53:19.802 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:53:19.803 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:53:19.801 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:53:19.802 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:53:19.802 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
16:53:19.804 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-controller
16:53:19.806 [TaskScheduleService] INFO  LINSTOR/Controller - SYSTEM - LogArchive: No logs to archive.
[custis] root@kube-master-1 ~ #
ERROR REPORT 63AC747C-00000-000000

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2022-12-29 16:50:55
Node:                               linstor-controller-857684db56-dvs9d

============================================================

Reported error:
===============

Category:                           Error
Class name:                         ImplementationError
Class canonical name:               com.linbit.ImplementationError
Generated at:                       Method 'unwrapInto', Source file 'SslTcpConnectorPeer.java', Line #241

Error message:                      Target decrypted buffer is too small!

Call backtrace:

    Method                                   Native Class:Line number
    unwrapInto                               N      com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:241
    readData                                 N      com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:287
    read                                     N      com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:195
    run                                      N      com.linbit.linstor.netcom.TcpConnectorService:543
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.

Satellite logs are clean.

Restart of linstor-controller resized LV, but not DRBD:

root@kube-node-7:/# lvs | grep pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c
  pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c_00000 fast_storage -wi-ao---- <56.02g    

root@kube-node-7:/# fdisk -l /dev/drbd1010
Disk /dev/drbd1010: 50 GiB, 53688164352 bytes, 104859696 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Restart of linstor-satellite didn't change anything. Thus executed resize manually:

# drbdadm resize pvc-fa1caf18-b638-4cca-9c2f-e1cdc01b6b3c

only after that DRBD become correct size:

# fdisk -l /dev/drbd1010
Disk /dev/drbd1010: 56 GiB, 60133433344 bytes, 117448112 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
duckhawk commented 1 year ago

I have exact the same problem. And after manual resize force I still have problem on diskless node:

root@node3:/# linstor v l | grep 557
| node3   | pvc-55789aab-5c34-4df7-b270-f58647822c54 | DfltDisklessStorPool |     0 |    1074 | /dev/drbd1074 |            | InUse  | Diskless |
| node4   | pvc-55789aab-5c34-4df7-b270-f58647822c54 | store                |     0 |    1074 | /dev/drbd1074 | 930.15 MiB | Unused | UpToDate |
root@node3:/# linstor vd l | grep 557
| pvc-55789aab-5c34-4df7-b270-f58647822c54 | 0        | 1074        | 10 GiB  |       | ok    |
root@node3:/# lsblk /dev/drbd1074
NAME     MAJ:MIN  RM SIZE RO TYPE MOUNTPOINTS
drbd1074 147:1074  0   1G  0 disk