LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Doesn't start on kernel 6.6 #407

Closed dimm0 closed 6 months ago

dimm0 commented 6 months ago

I'm using the latest piraeus operator.

On nodes with ubuntu 22.04 and kernel 6.6 the linstor satellite doesn't start with:

time="2024-05-03T16:25:48Z" level=info msg="running k8s-await-election" version=refs/tags/v0.4.1
LINSTOR, Module Satellite
Version:            1.27.1 (c6f8ceed9d50da2c4d37ae8ce20d09daf3046464)
Build time:         2024-04-25T11:12:13+00:00
Java Version:       17
Java VM:            Debian, Version 17.0.11+9-Debian-1deb12u1
Operating system:   Linux, Version 6.6.5-060605-generic
Environment:        amd64, 128 processors, 30688 MiB memory reserved for allocations
linstor-satellite
linstor-satellite
System components initialization in progress
linstor-satellite
Loading configuration file "/etc/linstor/linstor_satellite.toml"
16:25:48.886 [main] INFO  LINSTOR/Satellite - SYSTEM - ErrorReporter DB version 1 found.
16:25:48.888 [main] INFO  LINSTOR/Satellite - SYSTEM - Log directory set to: '/var/log/linstor-satellite'
16:25:48.923 [Main] INFO  LINSTOR/Satellite - SYSTEM - Loading API classes started.
16:25:49.157 [Main] INFO  LINSTOR/Satellite - SYSTEM - API classes loading finished: 234ms
16:25:49.158 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dependency injection started.
16:25:49.167 [Main] INFO  LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
16:25:49.167 [Main] INFO  LINSTOR/Satellite - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
16:25:49.167 [Main] INFO  LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
16:25:49.176 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
16:25:49.697 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dependency injection finished: 538ms
16:25:49.697 [Main] INFO  LINSTOR/Satellite - SYSTEM - Cryptography provider: Using default cryptography module
16:25:49.852 [Main] ERROR LINSTOR/Satellite - SYSTEM - Unable to provision, see the following errors:
linstor-satellite
1) [Guice/ErrorInjectingConstructor]: LinStorRuntimeException: Unable to create FileSystemWatch
  at LvmProvider.<init>(LvmProvider.java:130)
  at LvmProvider.class(LvmProvider.java:68)
  at DeviceProviderMapper.<init>(DeviceProviderMapper.java:63)
      \_ for 1st parameter
  at DeviceProviderMapper.class(DeviceProviderMapper.java:63)
  at StorageLayer.<init>(StorageLayer.java:77)
      \_ for 2nd parameter
  at StorageLayer.class(StorageLayer.java:77)
  at StltApiCallHandlerUtils.<init>(StltApiCallHandlerUtils.java:106)
      \_ for 17th parameter
  at DeviceManagerImpl.<init>(DeviceManagerImpl.java:275)
      \_ for 18th parameter
  at DeviceManagerImpl.class(DeviceManagerImpl.java:124)
  while locating DeviceManagerImpl
  at Satellite.<init>(Satellite.java:149)
      \_ for 6th parameter
  while locating Satellite
linstor-satellite
Learn more:
  https://github.com/google/guice/wiki/ERROR_INJECTING_CONSTRUCTOR
linstor-satellite
1 error
linstor-satellite
======================
Full classname legend:
======================
DeviceManagerImpl:       "com.linbit.linstor.core.devmgr.DeviceManagerImpl"
DeviceProviderMapper:    "com.linbit.linstor.layer.storage.DeviceProviderMapper"
LinStorRuntimeException: "com.linbit.linstor.LinStorRuntimeException"
LvmProvider:             "com.linbit.linstor.layer.storage.lvm.LvmProvider"
Satellite:               "com.linbit.linstor.core.Satellite"
StltApiCallHandlerUtils: "com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils"
StorageLayer:            "com.linbit.linstor.layer.storage.StorageLayer"
========================
End of classname legend:
========================
 [Report number 6635100C-4D978-000000]
dimm0 commented 6 months ago

Hmm, suddenly working...

jonmast commented 6 months ago

I'm seeing the same error - also using the piraeus operator but I'm on kernel 6.1

ghernadi commented 6 months ago

Can you attach the ErrorReport? Since the satellite should be offline, you will have to grab that manually from /var/log/linstor-satellite/ErrorReport-*.log

jonmast commented 6 months ago

Error report error-report.txt

The error message is "User limit of inotify instances reached or too many open files". With that hint I was able to figure out that my issue was that fs.inotify.max_user_instances was set too low, increasing it solved my problem