emqx / emqx

The most scalable open-source MQTT broker for IoT, IIoT, and connected vehicles
https://www.emqx.com/
Other
13.87k stars 2.22k forks source link

emqx crashing on startup after upgrade to 5.8.0 #13732

Open Tautcius opened 2 weeks ago

Tautcius commented 2 weeks ago

What happened?

emqx is not starting after upgrade to 5.8.0 version

What did you expect to happen?

program to start normally

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

No response

EMQX version

```console $ ./bin/emqx_ctl broker # paste output here ```

5.8.0

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here ```

kubernetes

Log files

Listener ssl:default on :8883 started. Listener ws:default on :8083 started. Listener wss:default on :8084 started. 2024-08-30T10:37:50.532292+00:00 [error] State machine ds_dqleader5_833EA891C9C40F7D terminating. Reason: {badmatch,{error,enospc}}. Stack: [{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]. Last event: {internal,{go,{#Ref<0.1021023274.4246732801.105640>,<0.3914.0>}}}. State: {post_init,#{id => {ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'},machine => {module,emqx_ds_replication_layer,#{db => dqleader,shard => <<"5">>}},parent => <0.3915.0>,uid => <<"5_1725014270531321">>,cluster_name => <<"dqleader_5">>,initial_members => [{ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'}],log_init_args => #{},system_config => #{message_queue_data => off_heap,name => dqleader,names => #{directory => ra_dqleader_directory,wal => ra_dqleader_log_wal,segment_writer => ra_dqleader_segment_writer,open_mem_tbls => ra_dqleader_log_open_mem_tables,closed_mem_tbls => ra_dqleader_log_closed_mem_tables,directory_rev => ra_dqleader_directory_reverse,log_ets => ra_dqleader_log_ets,log_meta => ra_dqleader_log_meta,log_sup => ra_dqleader_log_sup,server_sup => ra_dqleader_server_sup_sup,wal_sup => ra_dqleader_log_wal_sup},data_dir => "data/dqleader/dsrepl",segment_max_entries => 4096,segment_max_pending => 1024,segment_compute_checksums => true,wal_data_dir => "data/dqleader/dsrepl",wal_max_size_bytes => 256000000,wal_compute_checksums => true,wal_max_batch_size => 8192,wal_max_entries => undefined,wal_write_strategy => default,wal_sync_method => datasync,wal_garbage_collect => false,wal_pre_allocate => false,wal_min_bin_vheap_size => 46422,compress_mem_tables => false,default_max_pipeline_count => 4096,default_max_append_entries_rpc_batch_size => 128,receive_snapshot_timeout => 30000,server_min_bin_vheap_size => 46422,low_priority_commands_flush_size => 16,snapshot_chunk_size => 1000000,low_priority_commands_in_memory_size => 16},tick_timeout => 100}}. 2024-08-30T10:37:50.533140+00:00 [error] crasher: initial call: ra_server_proc:init/1, pid: <0.3916.0>, registered_name: ds_dqleader5_833EA891C9C40F7D, error: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3915.0>,ra_dqleader_server_sup_sup,<0.3829.0>,ra_systems_sup,ra_sup,<0.3534.0>], message_queue_len: 0, messages: [], links: [<0.3915.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 19670; neighbours: 2024-08-30T10:37:50.533504+00:00 [error] Supervisor: {<0.3915.0>,ra_server_sup}. Context: child_terminated. Reason: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id=ds_dqleader5_833EA891C9C40F7D,pid=<0.3916.0>. 2024-08-30T10:37:50.534907+00:00 [error] State machine ds_dqleader5_833EA891C9C40F7D terminating. Reason: {badmatch,{error,enospc}}. Stack: [{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]. Last event: {internal,{go,{#Ref<0.1021023274.4246732801.105640>,<0.3914.0>}}}. State: {post_init,#{id => {ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'},machine => {module,emqx_ds_replication_layer,#{db => dqleader,shard => <<"5">>}},parent => <0.3915.0>,uid => <<"5_1725014270531321">>,cluster_name => <<"dqleader_5">>,initial_members => [{ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'}],log_init_args => #{},system_config => #{message_queue_data => off_heap,name => dqleader,names => #{directory => ra_dqleader_directory,wal => ra_dqleader_log_wal,segment_writer => ra_dqleader_segment_writer,open_mem_tbls => ra_dqleader_log_open_mem_tables,closed_mem_tbls => ra_dqleader_log_closed_mem_tables,directory_rev => ra_dqleader_directory_reverse,log_ets => ra_dqleader_log_ets,log_meta => ra_dqleader_log_meta,log_sup => ra_dqleader_log_sup,server_sup => ra_dqleader_server_sup_sup,wal_sup => ra_dqleader_log_wal_sup},data_dir => "data/dqleader/dsrepl",segment_max_entries => 4096,segment_max_pending => 1024,segment_compute_checksums => true,wal_data_dir => "data/dqleader/dsrepl",wal_max_size_bytes => 256000000,wal_compute_checksums => true,wal_max_batch_size => 8192,wal_max_entries => undefined,wal_write_strategy => default,wal_sync_method => datasync,wal_garbage_collect => false,wal_pre_allocate => false,wal_min_bin_vheap_size => 46422,compress_mem_tables => false,default_max_pipeline_count => 4096,default_max_append_entries_rpc_batch_size => 128,receive_snapshot_timeout => 30000,server_min_bin_vheap_size => 46422,low_priority_commands_flush_size => 16,snapshot_chunk_size => 1000000,low_priority_commands_in_memory_size => 16},tick_timeout => 100}}. 2024-08-30T10:37:50.535866+00:00 [error] crasher: initial call: ra_server_proc:init/1, pid: <0.3917.0>, registered_name: ds_dqleader5_833EA891C9C40F7D, error: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3915.0>,ra_dqleader_server_sup_sup,<0.3829.0>,ra_systems_sup,ra_sup,<0.3534.0>], message_queue_len: 0, messages: [], links: [<0.3915.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 19678; neighbours: 2024-08-30T10:37:50.536423+00:00 [error] Supervisor: {<0.3915.0>,ra_server_sup}. Context: child_terminated. Reason: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id=ds_dqleader5_833EA891C9C40F7D,pid=<0.3917.0>. 2024-08-30T10:37:50.538647+00:00 [error] State machine ds_dqleader5_833EA891C9C40F7D terminating. Reason: {badmatch,{error,enospc}}. Stack: [{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]. Last event: {internal,{go,{#Ref<0.1021023274.4246732801.105640>,<0.3914.0>}}}. State: {post_init,#{id => {ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'},machine => {module,emqx_ds_replication_layer,#{db => dqleader,shard => <<"5">>}},parent => <0.3915.0>,uid => <<"5_1725014270531321">>,cluster_name => <<"dqleader_5">>,initial_members => [{ds_dqleader5_833EA891C9C40F7D,'emqx@emqx-core-6bff696f9b-0.emqx-headless.emqx.svc.cluster.local'}],log_init_args => #{},system_config => #{message_queue_data => off_heap,name => dqleader,names => #{directory => ra_dqleader_directory,wal => ra_dqleader_log_wal,segment_writer => ra_dqleader_segment_writer,open_mem_tbls => ra_dqleader_log_open_mem_tables,closed_mem_tbls => ra_dqleader_log_closed_mem_tables,directory_rev => ra_dqleader_directory_reverse,log_ets => ra_dqleader_log_ets,log_meta => ra_dqleader_log_meta,log_sup => ra_dqleader_log_sup,server_sup => ra_dqleader_server_sup_sup,wal_sup => ra_dqleader_log_wal_sup},data_dir => "data/dqleader/dsrepl",segment_max_entries => 4096,segment_max_pending => 1024,segment_compute_checksums => true,wal_data_dir => "data/dqleader/dsrepl",wal_max_size_bytes => 256000000,wal_compute_checksums => true,wal_max_batch_size => 8192,wal_max_entries => undefined,wal_write_strategy => default,wal_sync_method => datasync,wal_garbage_collect => false,wal_pre_allocate => false,wal_min_bin_vheap_size => 46422,compress_mem_tables => false,default_max_pipeline_count => 4096,default_max_append_entries_rpc_batch_size => 128,receive_snapshot_timeout => 30000,server_min_bin_vheap_size => 46422,low_priority_commands_flush_size => 16,snapshot_chunk_size => 1000000,low_priority_commands_in_memory_size => 16},tick_timeout => 100}}. 2024-08-30T10:37:50.539129+00:00 [error] crasher: initial call: ra_server_proc:init/1, pid: <0.3918.0>, registered_name: ds_dqleader5_833EA891C9C40F7D, error: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3915.0>,ra_dqleader_server_sup_sup,<0.3829.0>,ra_systems_sup,ra_sup,<0.3534.0>], message_queue_len: 0, messages: [], links: [<0.3915.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 19678; neighbours: 2024-08-30T10:37:50.539610+00:00 [error] Supervisor: {<0.3915.0>,ra_server_sup}. Context: child_terminated. Reason: {{badmatch,{error,enospc}},[{ra_log,init,1,[{file,"ra_log.erl"},{line,155}]},{ra_server,init,1,[{file,"ra_server.erl"},{line,277}]},{ra_server_proc,do_init,1,[{file,"ra_server_proc.erl"},{line,285}]},{ra_server_proc,post_init,3,[{file,"ra_server_proc.erl"},{line,348}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1395}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id=ds_dqleader5_833EA891C9C40F7D,pid=<0.3918.0>. 2024-08-30T10:37:50.539751+00:00 [error] Supervisor: {<0.3915.0>,ra_server_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=ds_dqleader5_833EA891C9C40F7D,pid=<0.3918.0>. 2024-08-30T10:37:50.540398+00:00 [error] Ra: failed to start ra server ra_dqleader_server_sup_sup, err shutdown 2024-08-30T10:37:50.540476+00:00 [error] crasher: initial call: emqx_ds_replication_layer_shard:init/1, pid: <0.3914.0>, registered_name: [], error: {{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3912.0>,<0.3843.0>,<0.3828.0>,emqx_ds_builtin_databases_sup,emqx_ds_builtin_raft_sup,<0.3542.0>], message_queue_len: 0, messages: [], links: [<0.3912.0>], dictionary: [], trap_exit: true, status: running, heap_size: 1598, stack_size: 28, reductions: 1742; neighbours: 2024-08-30T10:37:50.541410+00:00 [error] Supervisor: {via,gproc,{n,l,{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>}}}. Context: start_error. Reason: {{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id={<<53>>,replication},pid=undefined. 2024-08-30T10:37:50.546562+00:00 [error] crasher: initial call: emqx_ds_replication_shard_allocator:init/1, pid: <0.3845.0>, registered_name: [], error: {{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3828.0>,emqx_ds_builtin_databases_sup,emqx_ds_builtin_raft_sup,<0.3542.0>], message_queue_len: 0, messages: [], links: [<0.3828.0>], dictionary: [{'$logger_metadata$',#{domain => [emqx,ds,dqleader,shard_allocator],db => dqleader}}], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 1420; neighbours: 2024-08-30T10:37:50.548778+00:00 [error] Supervisor: {via,gproc,{n,l,{emqx_ds_builtin_raft_db_sup,dqleader}}}. Context: start_error. Reason: {{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id=shard_allocator,pid=undefined. 2024-08-30T10:37:50.837013+00:00 [error] crasher: initial call: supervisor:emqx_ds_shared_sub_registry/1, pid: <0.3827.0>, registered_name: [], error: {{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [emqx_ds_shared_sub_sup,<0.3825.0>], message_queue_len: 0, messages: [], links: [<0.3826.0>], dictionary: [], trap_exit: true, status: running, heap_size: 1598, stack_size: 28, reductions: 231; neighbours: 2024-08-30T10:37:50.837996+00:00 [error] Supervisor: {local,emqx_ds_shared_sub_sup}. Context: start_error. Reason: {{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}. Offender: id=emqx_ds_shared_sub_registry,pid=undefined. 2024-08-30T10:37:50.839169+00:00 [error] crasher: initial call: application_master:init/4, pid: <0.3824.0>, registered_name: [], exit: {{bad_return,{{emqx_ds_shared_sub_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_ds_shared_sub_registry,{{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}}}},[{emqx_ds_shared_sub_app,start,2,[{file,"emqx_ds_shared_sub_app.erl"},{line,19}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3823.0>], message_queue_len: 1, messages: [{'EXIT',<0.3825.0>,normal}], links: [<0.3823.0>,<0.3131.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 314; neighbours: 2024-08-30T10:37:50.840930+00:00 [critical] msg: failed_to_start_app, reason: {emqx_ds_shared_sub,{bad_return,{{emqx_ds_shared_sub_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_ds_shared_sub_registry,{{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}}}},[{emqx_ds_shared_sub_app,start,2,[{file,"emqx_ds_shared_sub_app.erl"},{line,19}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}}, app: emqx_ds_shared_sub 2024-08-30T10:37:50.841615+00:00 [error] Supervisor: {local,emqx_machine_sup}. Context: start_error. Reason: {'EXIT',{{failed_to_start_app,emqx_ds_shared_sub,{emqx_ds_shared_sub,{bad_return,{{emqx_ds_shared_sub_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_ds_shared_sub_registry,{{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}}}},[{emqx_ds_shared_sub_app,start,2,[{file,"emqx_ds_shared_sub_app.erl"},{line,19}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}}},[{emqx_machine_boot,start_one_app,1,[{file,"emqx_machine_boot.erl"},{line,112}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_machine_boot,ensure_apps_started,0,[{file,"emqx_machine_boot.erl"},{line,102}]},{emqx_machine_boot,post_boot,0,[{file,"emqx_machine_boot.erl"},{line,44}]},{supervisor,do_start_child_i,3,[{file,"supervisor.erl"},{line,420}]},{supervisor,do_start_child,2,[{file,"supervisor.erl"},{line,406}]},{supervisor,'-start_children/2-fun-0-',3,[{file,"supervisor.erl"},{line,390}]},{supervisor,children_map,4,[{file,"supervisor.erl"},{line,1258}]},{supervisor,init_children,2,[{file,"supervisor.erl"},{line,350}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}. Offender: id=emqx_machine_boot,pid=undefined. 2024-08-30T10:37:50.843706+00:00 [error] crasher: initial call: application_master:init/4, pid: <0.3299.0>, registered_name: [], exit: {{{shutdown,{failed_to_start_child,emqx_machine_boot,{'EXIT',{{failed_to_start_app,emqx_ds_shared_sub,{emqx_ds_shared_sub,{bad_return,{{emqx_ds_shared_sub_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_ds_shared_sub_registry,{{badmatch,{error,{{shutdown,{failed_to_start_child,shard_allocator,{{badmatch,{error,{{shutdown,{failed_to_start_child,{<<"5">>,replication},{{badmatch,{error,shutdown}},[{emqx_ds_replication_layer_shard,start_server,3,[{file,"emqx_ds_replication_layer_shard.erl"},{line,449}]},{emqx_ds_replication_layer_shard,init,1,[{file,"emqx_ds_replication_layer_shard.erl"},{line,349}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,<<"5">>,{emqx_ds_builtin_raft_db_sup,start_link_sup,[{emqx_ds_builtin_db_shard_sup,dqleader,<<"5">>},[]]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_replication_shard_allocator,start_shard,2,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,407}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_ds_replication_shard_allocator,allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,394}]},{emqx_ds_replication_shard_allocator,handle_allocate_shards,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,129}]},{emqx_ds_replication_shard_allocator,init,1,[{file,"emqx_ds_replication_shard_allocator.erl"},{line,88}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}},{child,undefined,dqleader,{emqx_ds_builtin_raft_db_sup,start_db,[dqleader,#{storage => {emqx_ds_storage_skipstream_lts,#{topic_index_bytes => 8,lts_threshold_spec => {simple,{inf,inf,inf,0}},serialization_schema => v1,wildcard_hash_bytes => 8}},backend => builtin_raft,atomic_batches => false,force_monotonic_timestamps => false,n_shards => 16,replication_options => #{},n_sites => 1,replication_factor => 3}]},permanent,false,infinity,supervisor,[emqx_ds_builtin_raft_db_sup]}}}},[{emqx_ds_shared_sub_registry,init,1,[{file,"emqx_ds_shared_sub_registry.erl"},{line,73}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,330}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}}}},[{emqx_ds_shared_sub_app,start,2,[{file,"emqx_ds_shared_sub_app.erl"},{line,19}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}}},[{emqx_machine_boot,start_one_app,1,[{file,"emqx_machine_boot.erl"},{line,112}]},{lists,foreach_1,2,[{file,"lists.erl"},{line,1686}]},{emqx_machine_boot,ensure_apps_started,0,[{file,"emqx_machine_boot.erl"},{line,102}]},{emqx_machine_boot,post_boot,0,[{file,"emqx_machine_boot.erl"},{line,44}]},{supervisor,do_start_child_i,3,[{file,"supervisor.erl"},{line,420}]},{supervisor,do_start_child,2,[{file,"supervisor.erl"},{line,406}]},{supervisor,'-start_children/2-fun-0-',3,[{file,"supervisor.erl"},{line,390}]},{supervisor,children_map,4,[{file,"supervisor.erl"},{line,1258}]},{supervisor,init_children,2,[{file,"supervisor.erl"},{line,350}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,980}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,935}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}}}},{emqx_machine_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}, ancestors: [<0.3298.0>], message_queue_len: 1, messages: [{'EXIT',<0.3300.0>,normal}], links: [<0.3298.0>,<0.3131.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 198; neighbours: Runtime terminating during boot (terminating)
id commented 2 weeks ago

{error,enospc} means "No space left on device". Could you check if there is enough free space on the filesystem?

Tautcius commented 1 week ago

I attached bigger pvc for 200m and this updated instance was started in the same size node. I am using operator to deply emqx on Kubernetes

Tautcius commented 1 week ago

I think it is related to PVC. I have increased size from 20 mb to 1 gb and it is still not helping, but when I create new EMQX cluster withoud persistance it is starting without errors.

Tautcius commented 1 week ago

So this problem can be solved with minimum 5Gi of PVC size, you should update docs and templates with that.

qzhuyan commented 1 week ago

hi you may try to set durable_storage.messages.n_shards = 2 to lower the storage usage(space), but it may reduce throughput if durable_storage is in use.

keynslug commented 1 week ago

durable_storage.messages.n_shards = 2

Judging from the log snippet, I suspect it's another, not-yet-active DB that could be tuned to achieve that, i.e. durable_storage.queues.n_shards = 2.

qzhuyan commented 1 week ago

@Tautcius pls try set these for the emqx and let us know if that helps.

durable_storage.messages.n_shards = 2
durable_storage.queues.n_shards = 2
Tautcius commented 1 week ago

It helped to reduce size to 2 Gi that is better from 5 :)

qzhuyan commented 1 week ago

you should update docs and templates with that.

@Tautcius , would you create a PR for that ? would be happy to see contribution from the community.

Tautcius commented 1 week ago

Will create that over the weekend