LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Linstor-controller crashloop 1.28 #415

Closed boedy closed 3 months ago

boedy commented 3 months ago

I have no clue what might have caused this. I'm trying to figure out how I can restore the controller. This is the error report:

ERROR REPORT 66C4BE4D-00000-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.28.0
Build ID:                           959382f7b4fb9436fefdd21dfa262e90318edaed
Build time:                         2024-07-11T10:21:06+00:00
Error time:                         2024-08-20 16:03:38
Node:                               linstor-controller-7b9c4ccd45-ndv2c
Thread:                             Main

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         LinStorDBRuntimeException
Class canonical name:               com.linbit.linstor.LinStorDBRuntimeException
Generated at:                       Method 'loadAll', Source file 'K8sCrdEngine.java', Line #267

Error message:                      Database entry of table VOLUMES could not be restored.

ErrorContext:   Details:     Primary key: NODE_NAME = 'H-FSN-DED4', RESOURCE_NAME = 'PVC-FC8FF055-7996-42FF-9A78-A552F8469845', SNAPSHOT_NAME = 'SNAPSHOT-0B086E76-F8FD-4143-999E-F2701E889C5D', VLM_NR = '0'

Call backtrace:

    Method                                   Native Class:Line number
    loadAll                                  N      com.linbit.linstor.dbdrivers.k8s.crd.K8sCrdEngine:267
    loadAll                                  N      com.linbit.linstor.dbdrivers.AbsDatabaseDriver:180
    loadAllAsList                            N      com.linbit.linstor.dbdrivers.ControllerDatabaseDriver:33
    loadCoreObjects                          N      com.linbit.linstor.dbdrivers.DatabaseLoader:590
    loadCoreObjects                          N      com.linbit.linstor.core.DbDataInitializer:169
    initialize                               N      com.linbit.linstor.core.DbDataInitializer:101
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
    start                                    N      com.linbit.linstor.core.Controller:374
    main                                     N      com.linbit.linstor.core.Controller:625

Caused by:
==========

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method '<init>', Source file 'SnapshotVolume.java', Line #59

Error message:                      Cannot invoke "com.linbit.linstor.core.objects.Snapshot.getNodeName()" because "snapshotRef" is null

Call backtrace:

    Method                                   Native Class:Line number
    <init>                                   N      com.linbit.linstor.core.objects.SnapshotVolume:59
    load                                     N      com.linbit.linstor.core.objects.SnapshotVolumeDbDriver:121
    load                                     N      com.linbit.linstor.core.objects.SnapshotVolumeDbDriver:43
    loadAll                                  N      com.linbit.linstor.dbdrivers.k8s.crd.K8sCrdEngine:238
    loadAll                                  N      com.linbit.linstor.dbdrivers.AbsDatabaseDriver:180
    loadAllAsList                            N      com.linbit.linstor.dbdrivers.ControllerDatabaseDriver:33
    loadCoreObjects                          N      com.linbit.linstor.dbdrivers.DatabaseLoader:590
    loadCoreObjects                          N      com.linbit.linstor.core.DbDataInitializer:169
    initialize                               N      com.linbit.linstor.core.DbDataInitializer:101
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
    start                                    N      com.linbit.linstor.core.Controller:374
    main                                     N      com.linbit.linstor.core.Controller:625

END OF ERROR REPORT.
boedy commented 3 months ago

I seem to have fixed it. Although I'm not sure how that happend either.. 😅

I had a look at different volumesnapshot records that were stored in CRDs. I noticed that the particular snapshot in error didn't have any secaclmap and secobjectprotection records. I figured these might be needed so I created some manually to see if that would fix it. I intially didn't think it would fix it, but now as it started working again, I have the feeling this must have been it?

Anyways, I'm releaved the linstor-controller is back up again.