LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

ERROR LINSTOR/Satellite SYSTEM Problem of type java.lang #385

Closed t0mat000 closed 9 months ago

t0mat000 commented 9 months ago

Hi,

Linstor Satellite is not starting with new Proxmox Kernel on Debian 11/Proxmox 7.4

Kernel 5.15.131 works Kernel 5.15.136-1-pve does not...

Feb 01 13:32:26 SERVER02 Satellite[21492]: 13:32:26.908 [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - Problem of type 'java.lang.>

Jay2k1 commented 9 months ago

To add to this, the whole log line reads [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - Problem of type 'java.lang.NullPointerException' logged to report number 65BB8F4B-FF2A8-000288

This is said error report:

ERROR REPORT 65BB8F4B-FF2A8-000288

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.26.0
Build ID:                           ed570d372b69f58bfdc46e35ec11286472e5ccd2
Build time:                         2024-01-29T08:16:15+00:00
Error time:                         2024-02-01 13:35:25
Node:                               server02

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'isAllowed', Source file 'WhitelistProps.java', Line #288

Error context:
    An error occurred while processing resource 'Node: 'server02', Rsc: 'vm-43247-disk-1''

Call backtrace:

    Method                                   Native Class:Line number
    isAllowed                                N      com.linbit.linstor.api.prop.WhitelistProps:288
    checkValidDrbdOption                     N      com.linbit.linstor.layer.drbd.utils.ConfFileBuilder:671
    appendConflictingDrbdOptions             N      com.linbit.linstor.layer.drbd.utils.ConfFileBuilder:753
    build                                    N      com.linbit.linstor.layer.drbd.utils.ConfFileBuilder:224
    regenerateResFile                        N      com.linbit.linstor.layer.drbd.DrbdLayer:1670
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:693
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:436
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:934
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:379
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:177
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
    run                                      N      java.lang.Thread:829

END OF ERROR REPORT.
ghernadi commented 9 months ago

Please check if DRBD 9+ is installed on that node.

We have seen cases that after an update / reboot of the machine, again the upstream DRBD 8.4 was loaded instead of the previously insmod-ed DRBD 9+.

t0mat000 commented 9 months ago

OK, testet now a different Server/Cluster. Same Problem!

Feb 07 12:45:11 Server03 Satellite[1959]: 12:45:11.720 [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - Problem of type 'java.lang.NullPointerException' logged to report number 65C36D24-3F411-000012

It is only DRBD 9 installed and loaded:

drbdadm -V DRBDADM_BUILDTAG=GIT-hash:\ fdd9a4d603a9dc99d110d8bd0e288d7c0b6f586e\ build\ by\ @buildsystem\,\ 2023-12-22\ 09:53:59 DRBDADM_API_VERSION=1 DRBD_KERNEL_VERSION_CODE=0x08040b DRBDADM_VERSION_CODE=0x091b00 DRBDADM_VERSION=9.27.0

drbdadm status /var/lib/linstor.d/vm-100-disk-1.res:10: Parse error: 'an option keyword' expected, but got 'quorum'

ghernadi commented 9 months ago

It is only DRBD 9 installed and loaded:

Except that it is not:

 DRBD_KERNEL_VERSION_CODE=0x08040b
rck commented 9 months ago

(cat /proc/drbd)

rck commented 9 months ago

and to answer the question that comes next:

apt install pve-headers pve-headers-$(uname -r)
apt install --reinstall drbd-dkms
rmmod drbd
drbdadm --version
t0mat000 commented 9 months ago

OK, thank you. But we ran in next error.

Building initial module for 5.15.136-1-pve
Error! Bad return status for module build on kernel: 5.15.136-1-pve (x86_64)
Consult /var/lib/dkms/drbd/9.2.7-1/build/make.log for more information.
dpkg: error processing package drbd-dkms (--configure):
 installed drbd-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 drbd-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)

/var/lib/dkms/drbd/9.2.7-1/build/make.log:

DKMS make.log for drbd-9.2.7-1 for kernel 5.15.136-1-pve (x86_64)
Wed 07 Feb 2024 01:14:56 PM CET
make: Entering directory '/var/lib/dkms/drbd/9.2.7-1/build/src/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.15.136-1-pve/build

cat /dev/null   > Module.symvers
make -C /lib/modules/5.15.136-1-pve/build   M=/var/lib/dkms/drbd/9.2.7-1/build/src/drbd  modules "PRE_CFLAGS="
  COMPAT  __vmalloc_has_2_params
  COMPAT  add_disk_returns_int
  COMPAT  before_4_13_kernel_read
  COMPAT  bio_alloc_has_4_params
  COMPAT  blkdev_get_by_path_has_holder_ops
  COMPAT  blkdev_issue_discard_takes_flags
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  blkdev_put_has_holder
  COMPAT  block_device_operations_open_takes_gendisk
  COMPAT  block_device_operations_release_takes_single_argument
  COMPAT  can_include_vermagic_h
  COMPAT  dax_direct_access_takes_mode
  COMPAT  fs_dax_get_by_bdev_takes_start_off
  COMPAT  fs_dax_get_by_bdev_takes_start_off_and_holder
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_GENHD_FL_NO_PART
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have___bio_add_page
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdev_discard_granularity
  COMPAT  have_bdev_max_discard_sectors
  COMPAT  have_bdev_nr_sectors
  COMPAT  have_bdevname
  COMPAT  have_bdgrab
  COMPAT  have_bdi_congested
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_advance_iter_single
  COMPAT  have_bio_alloc_clone
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_split_to_limits
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_cleanup_disk
  COMPAT  have_blk_mode_t
  COMPAT  have_blk_opf_t
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_qc_t_submit_bio
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_max_write_same_sectors
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_bvec_kmap_local
  COMPAT  have_disk_update_readahead
  COMPAT  have_enum_req_op
  COMPAT  have_fallthrough
  COMPAT  have_d_inode
  COMPAT  have_fs_dax_get_by_bdev
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_info_userhdr
  COMPAT  have_get_random_u32
  COMPAT  have_get_random_u32_below
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_kmap_local_page
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_kvfree_rcu
  COMPAT  have_list_is_first
  COMPAT  have_kvfree_rcu_mightsleep
  COMPAT  have_list_next_entry
  COMPAT  have_lookup_user_key
  COMPAT  have_max_send_recv_sge
  COMPAT  have_msg_splice_pages
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_strscpy
  COMPAT  have_nla_put_64bit
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_proc_create_single
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_queue_flag_discard
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_sched_signal_h
  COMPAT  have_security_netlink_recv
  COMPAT  have_req_write
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_sk_use_task_frag
  COMPAT  have_sock_set_keepalive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_strscpy
  COMPAT  have_struct_size
  COMPAT  have_simple_positive
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tasklet_setup
  COMPAT  have_tcp_sock_set_keepcnt
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_tcp_sock_set_keepidle
  COMPAT  have_timer_setup
  COMPAT  have_timer_shutdown
  COMPAT  have_tls_get_record_type
  COMPAT  have_tls_tx_rx
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_make_request_recursion
  COMPAT  need_skb_abort_seq_read
  COMPAT  need_drbd_wrappers
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  rdma_reject_has_reason_arg
  COMPAT  sk_data_ready_has_1_param
  COMPAT  sock_create_kern_has_netns_parameter
  COMPAT  struct_gendisk_has_backing_dev_info
  COMPAT  sock_ops_returns_addr_len
  UPD     /var/lib/dkms/drbd/9.2.7-1/build/src/drbd/compat.5.15.136-1-pve.h
  UPD     /var/lib/dkms/drbd/9.2.7-1/build/src/drbd/compat.h
  GENPATCHNAMES   5.15.136-1-pve
  SPATCH   15442817a313c8bbbe63a4d2e3589a2f  5.15.136-1-pve
    drbd-kernel-compat/cocci_cache/15442817a313c8bbbe63a4d2e3589a2f/.compat.cocci
    : Python path configuration:
    :   PYTHONHOME = '/lib/x86_64-linux-gnu/..'
    :   PYTHONPATH = '/usr/bin/../lib/coccinelle/python'
    :   program name = 'python3'
    :   isolated = 0
    :   environment = 1
    :   user site = 1
    :   import site = 1
    :   sys._base_executable = '/bin/python3'
    :   sys.base_prefix = '/lib/x86_64-linux-gnu/..'
    :   sys.base_exec_prefix = '/lib/x86_64-linux-gnu/..'
    :   sys.platlibdir = 'lib'
    :   sys.executable = '/bin/python3'
    :   sys.prefix = '/lib/x86_64-linux-gnu/..'
    :   sys.exec_prefix = '/lib/x86_64-linux-gnu/..'
    :   sys.path = [
    :     '/usr/bin/../lib/coccinelle/python',
    :     '/lib/x86_64-linux-gnu/../lib/python39.zip',
    :     '/lib/x86_64-linux-gnu/../lib/python3.9',
    :     '/lib/x86_64-linux-gnu/../lib/python3.9/lib-dynload',
    :   ]
    : Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
    : Python runtime state: core initialized
    : ModuleNotFoundError: No module named 'encodings'
    : 
    : Current thread 0x00007ff6b46ca180 (most recent call first):
    : <no Python frame>
make[3]: *** [Makefile:229: drbd-kernel-compat/cocci_cache/15442817a313c8bbbe63a4d2e3589a2f/compat.patch] Error 1
make[2]: *** [/var/lib/dkms/drbd/9.2.7-1/build/src/drbd/Kbuild:135: /var/lib/dkms/drbd/9.2.7-1/build/src/drbd/drbd-kernel-compat/compat.patch] Error 2
make[1]: *** [Makefile:1911: /var/lib/dkms/drbd/9.2.7-1/build/src/drbd] Error 2
make: *** [Makefile:184: kbuild] Error 2
make: Leaving directory '/var/lib/dkms/drbd/9.2.7-1/build/src/drbd'
rck commented 9 months ago

you could remove your local coccinelle, then it will query a remote service which should hopefully have a proper installation of coccinelle

t0mat000 commented 9 months ago

ok, Problem solved, thx ;)

apt remove coccinelle

apt install pve-headers pve-headers-$(uname -r) apt install --reinstall drbd-dkms rmmod drbd drbdadm --version