LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
988 stars 76 forks source link

NullPointerException while loading linstor-controller #295

Closed kvaps closed 2 years ago

kvaps commented 2 years ago

still the same cluster https://github.com/LINBIT/linstor-server/issues/294#issuecomment-1160467042, problem when loading linstor-controller:

LINSTOR, Module Controller
Version:            1.18.2 (26945460e48d2b9e98f6e2163e05b722dd5ff3ca)
Build time:         2022-05-30T09:47:28+00:00
Java Version:       11
Java VM:            Debian, Version 11.0.15+10-post-Debian-1deb11u1
Operating system:   Linux, Version 5.4.0-117-generic
Environment:        amd64, 1 processors, 2888 MiB memory reserved for allocations

System components initialization in progress

Loading configuration file "/etc/linstor/linstor.toml"
11:01:35.070 [main] INFO  LINSTOR/Controller - SYSTEM - ErrorReporter DB first time init.
11:01:35.073 [main] INFO  LINSTOR/Controller - SYSTEM - Log directory set to: '/var/log/linstor-controller'
11:01:35.104 [main] INFO  LINSTOR/Controller - SYSTEM - Database type is Kubernetes-CRD
11:01:35.105 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading API classes started.
11:01:35.500 [Main] INFO  LINSTOR/Controller - SYSTEM - API classes loading finished: 395ms
11:01:35.500 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection started.
11:01:35.523 [Main] INFO  LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
11:01:35.524 [Main] INFO  LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" is not installed
11:01:36.548 [Main] INFO  LINSTOR/Controller - SYSTEM - Dependency injection finished: 1048ms
11:01:36.805 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
11:01:37.365 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
11:01:37.365 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the k8s crd database connector
11:01:37.365 [Main] INFO  LINSTOR/Controller - SYSTEM - Kubernetes-CRD connection URL is "k8s"
11:01:39.928 [Main] INFO  LINSTOR/Controller - SYSTEM - Starting service instance 'K8sCrdDatabaseService' of type K8sCrdDatabaseService
11:01:39.950 [Main] INFO  LINSTOR/Controller - SYSTEM - Loading security objects
11:01:40.117 [Main] INFO  LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
11:01:45.224 [Main] INFO  LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
11:02:25.409 [Main] ERROR LINSTOR/Controller - SYSTEM - Problem of type 'java.lang.NullPointerException' logged to report number 62B1A50E-00000-000000

11:02:25.416 [Main] ERROR LINSTOR/Controller - SYSTEM - Unhandled exception [Report number 62B1A50E-00000-000001]
ERROR REPORT 62B1A50E-00000-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.18.2
Build ID:                           26945460e48d2b9e98f6e2163e05b722dd5ff3ca
Build time:                         2022-05-30T09:47:28+00:00
Error time:                         2022-06-21 11:02:25
Node:                               linstor-controller-677cfcd4c6-krhpw

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'lambda$loadLayerObects$7', Source file 'DatabaseLoader.java', Line #672

Call backtrace:

    Method                                   Native Class:Line number
    lambda$loadLayerObects$7                 N      com.linbit.linstor.dbdrivers.DatabaseLoader:672
    loadLayerData                            N      com.linbit.linstor.dbdrivers.DatabaseLoader:749
    loadLayerObects                          N      com.linbit.linstor.dbdrivers.DatabaseLoader:666
    loadAll                                  N      com.linbit.linstor.dbdrivers.DatabaseLoader:584
    loadCoreObjects                          N      com.linbit.linstor.core.DbDataInitializer:176
    initialize                               N      com.linbit.linstor.core.DbDataInitializer:108
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
    start                                    N      com.linbit.linstor.core.Controller:347
    main                                     N      com.linbit.linstor.core.Controller:586

END OF ERROR REPORT.
ERROR REPORT 62B1A50E-00000-000001

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Controller
Version:                            1.18.2
Build ID:                           26945460e48d2b9e98f6e2163e05b722dd5ff3ca
Build time:                         2022-05-30T09:47:28+00:00
Error time:                         2022-06-21 11:02:25
Node:                               linstor-controller-677cfcd4c6-krhpw

============================================================

Reported error:
===============

Description:
    Unhandled exception

Category:                           LinStorException
Class name:                         SystemServiceStartException
Class canonical name:               com.linbit.SystemServiceStartException
Generated at:                       Method 'startSystemServices', Source file 'ApplicationLifecycleManager.java', Line #103

Error message:                      Unhandled exception

Call backtrace:

    Method                                   Native Class:Line number
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:103
    start                                    N      com.linbit.linstor.core.Controller:347
    main                                     N      com.linbit.linstor.core.Controller:586

Caused by:
==========

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'lambda$loadLayerObects$7', Source file 'DatabaseLoader.java', Line #672

Call backtrace:

    Method                                   Native Class:Line number
    lambda$loadLayerObects$7                 N      com.linbit.linstor.dbdrivers.DatabaseLoader:672
    loadLayerData                            N      com.linbit.linstor.dbdrivers.DatabaseLoader:749
    loadLayerObects                          N      com.linbit.linstor.dbdrivers.DatabaseLoader:666
    loadAll                                  N      com.linbit.linstor.dbdrivers.DatabaseLoader:584
    loadCoreObjects                          N      com.linbit.linstor.core.DbDataInitializer:176
    initialize                               N      com.linbit.linstor.core.DbDataInitializer:108
    startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
    start                                    N      com.linbit.linstor.core.Controller:347
    main                                     N      com.linbit.linstor.core.Controller:586

END OF ERROR REPORT.
ghernadi commented 2 years ago

For now I assume that this NPE is related to https://github.com/LINBIT/linstor-server/issues/294#issuecomment-1160467042

To be more precise, I assume that you forgot to clean all database tables, specifically the LAYER_* tables. LAYER_RESOURCE_ID is your entry point, that has the layer-id mapped to the actual resource (node_name, resource_name and snapshotname). From there you need to look for the other `LAYER*` tables to find the layer-id's from before and clear all those entries as well.

Btw, that is (one of) the reason(s) we usually say to not touch the database directly :)

kvaps commented 2 years ago

@ghernadi many thanks for your help.

You're right, I removed Resources and Volumes but didn't remove LayerResourceIds, PropsContainers, SecAclMap and SecObjectProtection.

My bad, now it is working.