JDRaftKeeper / RaftKeeper

RaftKeeper is a high-performance distributed consensus service.
Apache License 2.0
141 stars 37 forks source link

Add more metrics to `mntr` command #168

Open JackyWoo opened 10 months ago

JackyWoo commented 10 months ago

Description

Right now the only way to introspect RaftKeeper is 4lw command which is based on Zookeeper 3.5. We fould the metrics is too simple to known what is the internal happening.

So we should better enhance the monitoring system. The basic ieade is to enhance 4lw command but not add promethus system, because it is a more simple way and will not introduce stuff for users.

The following are some metrics we need:

  1. detailed command statistic: read/write, create/delete/get/multi-write
  2. detailed multi-command statistic
  3. append entries batch size
  4. forwarding batch size
  5. more accurate RT time

Are you willing to submit PR?

JackyWoo commented 10 months ago

@lzydmxy please take a look at this issue

JackyWoo commented 10 months ago

Zookeeper 3.9 has more metrics than 3.5, we can refer to it.

lzydmxy commented 7 months ago

@JackyWoo These are all the monitoring items for zk 3.9. We can start by adding latency metrics for the core links of raft, equivalent to zk's sync_processor_queue_time_ms and sync_processor_queue_flush_time_ms.

zk_version  3.9.0-1674a5e97f43bc38e9bf56b04f83a7ae34d68249, built on 2023-07-19 09:09 UTC
zk_server_state standalone
zk_ephemerals_count 0
zk_num_alive_connections    1
zk_avg_latency  0.0
zk_outstanding_requests 0
zk_znode_count  5
zk_global_sessions  0
zk_non_mtls_remote_conn_count   0
zk_last_client_response_size    -1
zk_packets_sent 1
zk_packets_received 2
zk_max_client_response_size -1
zk_connection_drop_probability  0.0
zk_watch_count  0
zk_auth_failed_count    0
zk_min_latency  0
zk_max_file_descriptor_count    204800
zk_approximate_data_size    44
zk_open_file_descriptor_count   94
zk_local_sessions   0
zk_uptime   29500
zk_max_latency  0
zk_outstanding_tls_handshake    0
zk_min_client_response_size -1
zk_non_mtls_local_conn_count    0
zk_watch_bytes  0
zk_stale_requests_dropped   0
zk_throttled_ops    0
zk_insecure_admin_count 0
zk_connection_rejected  0
zk_sessionless_connections_expired  0
zk_dead_watchers_queued 0
zk_stale_requests   0
zk_connection_drop_count    0
zk_response_packet_cache_hits   0
zk_bytes_received_count 8
zk_add_dead_watcher_stall_time  0
zk_request_throttle_wait_count  0
zk_requests_not_forwarded_to_commit_processor   0
zk_response_packet_cache_misses 0
zk_prep_processor_request_queued    0
zk_stale_replies    0
zk_response_bytes   0
zk_ensemble_auth_fail   0
zk_diff_count   0
zk_connection_revalidate_count  0
zk_quit_leading_due_to_disloyal_voter   0
zk_unrecoverable_error_count    0
zk_unsuccessful_handshake   0
zk_commit_count 0
zk_outstanding_changes_queued   0
zk_request_commit_queued    0
zk_ensemble_auth_skip   0
zk_skip_learner_request_to_next_processor_count 0
zk_proposal_count   0
zk_large_requests_rejected  0
zk_outstanding_changes_removed  0
zk_restore_error_count  0
zk_cnxn_closed_without_zk_server_running    0
zk_looking_count    0
zk_snapshot_rate_limited_count  0
zk_learner_proposal_received_count  0
zk_digest_mismatches_count  0
zk_dead_watchers_cleared    0
zk_ensemble_auth_success    0
zk_learner_commit_received_count    0
zk_snapshot_error_count 0
zk_connection_request_count 0
zk_response_packet_get_children_cache_misses    0
zk_snap_count   0
zk_stale_sessions_expired   0
zk_restore_rate_limited_count   0
zk_response_packet_get_children_cache_hits  0
zk_sync_processor_request_queued    0
zk_tls_handshake_exceeded   0
zk_revalidate_count 0
zk_avg_socket_closing_time  0.0
zk_min_socket_closing_time  0
zk_max_socket_closing_time  0
zk_cnt_socket_closing_time  0
zk_sum_socket_closing_time  0
zk_avg_proposal_process_time    0.0
zk_min_proposal_process_time    0
zk_max_proposal_process_time    0
zk_cnt_proposal_process_time    0
zk_sum_proposal_process_time    0
zk_avg_leader_unavailable_time  0.0
zk_min_leader_unavailable_time  0
zk_max_leader_unavailable_time  0
zk_cnt_leader_unavailable_time  0
zk_sum_leader_unavailable_time  0
zk_avg_node_created_watch_count 0.0
zk_min_node_created_watch_count 0
zk_max_node_created_watch_count 0
zk_cnt_node_created_watch_count 0
zk_sum_node_created_watch_count 0
zk_avg_session_queues_drained   0.0
zk_min_session_queues_drained   0
zk_max_session_queues_drained   0
zk_cnt_session_queues_drained   0
zk_sum_session_queues_drained   0
zk_avg_write_commit_proc_req_queued 0.0
zk_min_write_commit_proc_req_queued 0
zk_max_write_commit_proc_req_queued 0
zk_cnt_write_commit_proc_req_queued 0
zk_sum_write_commit_proc_req_queued 0
zk_avg_connection_token_deficit 0.0
zk_min_connection_token_deficit 0
zk_max_connection_token_deficit 0
zk_cnt_connection_token_deficit 0
zk_sum_connection_token_deficit 0
zk_avg_read_commit_proc_req_queued  0.0
zk_min_read_commit_proc_req_queued  0
zk_max_read_commit_proc_req_queued  0
zk_cnt_read_commit_proc_req_queued  0
zk_sum_read_commit_proc_req_queued  0
zk_avg_node_deleted_watch_count 0.0
zk_min_node_deleted_watch_count 0
zk_max_node_deleted_watch_count 0
zk_cnt_node_deleted_watch_count 0
zk_sum_node_deleted_watch_count 0
zk_avg_startup_txns_load_time   0.0
zk_min_startup_txns_load_time   0
zk_max_startup_txns_load_time   0
zk_cnt_startup_txns_load_time   0
zk_sum_startup_txns_load_time   0
zk_avg_sync_processor_queue_size    0.0
zk_min_sync_processor_queue_size    0
zk_max_sync_processor_queue_size    0
zk_cnt_sync_processor_queue_size    1
zk_sum_sync_processor_queue_size    0
zk_avg_follower_sync_time   0.0
zk_min_follower_sync_time   0
zk_max_follower_sync_time   0
zk_cnt_follower_sync_time   0
zk_sum_follower_sync_time   0
zk_avg_prep_processor_queue_size    0.0
zk_min_prep_processor_queue_size    0
zk_max_prep_processor_queue_size    0
zk_cnt_prep_processor_queue_size    1
zk_sum_prep_processor_queue_size    0
zk_avg_fsynctime    0.0
zk_min_fsynctime    0
zk_max_fsynctime    0
zk_cnt_fsynctime    0
zk_sum_fsynctime    0
zk_avg_inflight_snap_count  0.0
zk_min_inflight_snap_count  0
zk_max_inflight_snap_count  0
zk_cnt_inflight_snap_count  0
zk_sum_inflight_snap_count  0
zk_avg_reads_issued_from_session_queue  0.0
zk_min_reads_issued_from_session_queue  0
zk_max_reads_issued_from_session_queue  0
zk_cnt_reads_issued_from_session_queue  0
zk_sum_reads_issued_from_session_queue  0
zk_avg_restore_time 0.0
zk_min_restore_time 0
zk_max_restore_time 0
zk_cnt_restore_time 0
zk_sum_restore_time 0
zk_avg_learner_request_processor_queue_size 0.0
zk_min_learner_request_processor_queue_size 0
zk_max_learner_request_processor_queue_size 0
zk_cnt_learner_request_processor_queue_size 0
zk_sum_learner_request_processor_queue_size 0
zk_avg_snapshottime 1.0
zk_min_snapshottime 1
zk_max_snapshottime 1
zk_cnt_snapshottime 1
zk_sum_snapshottime 1
zk_avg_unavailable_time 0.0
zk_min_unavailable_time 0
zk_max_unavailable_time 0
zk_cnt_unavailable_time 0
zk_sum_unavailable_time 0
zk_avg_startup_txns_loaded  0.0
zk_min_startup_txns_loaded  0
zk_max_startup_txns_loaded  0
zk_cnt_startup_txns_loaded  0
zk_sum_startup_txns_loaded  0
zk_avg_reads_after_write_in_session_queue   0.0
zk_min_reads_after_write_in_session_queue   0
zk_max_reads_after_write_in_session_queue   0
zk_cnt_reads_after_write_in_session_queue   0
zk_sum_reads_after_write_in_session_queue   0
zk_avg_requests_in_session_queue    0.0
zk_min_requests_in_session_queue    0
zk_max_requests_in_session_queue    0
zk_cnt_requests_in_session_queue    0
zk_sum_requests_in_session_queue    0
zk_avg_write_commit_proc_issued 0.0
zk_min_write_commit_proc_issued 0
zk_max_write_commit_proc_issued 0
zk_cnt_write_commit_proc_issued 0
zk_sum_write_commit_proc_issued 0
zk_avg_prep_process_time    0.0
zk_min_prep_process_time    0
zk_max_prep_process_time    0
zk_cnt_prep_process_time    0
zk_sum_prep_process_time    0
zk_avg_pending_session_queue_size   0.0
zk_min_pending_session_queue_size   0
zk_max_pending_session_queue_size   0
zk_cnt_pending_session_queue_size   0
zk_sum_pending_session_queue_size   0
zk_avg_time_waiting_empty_pool_in_commit_processor_read_ms  0.0
zk_min_time_waiting_empty_pool_in_commit_processor_read_ms  0
zk_max_time_waiting_empty_pool_in_commit_processor_read_ms  0
zk_cnt_time_waiting_empty_pool_in_commit_processor_read_ms  0
zk_sum_time_waiting_empty_pool_in_commit_processor_read_ms  0
zk_avg_commit_process_time  0.0
zk_min_commit_process_time  0
zk_max_commit_process_time  0
zk_cnt_commit_process_time  0
zk_sum_commit_process_time  0
zk_avg_dbinittime   6.0
zk_min_dbinittime   6
zk_max_dbinittime   6
zk_cnt_dbinittime   1
zk_sum_dbinittime   6
zk_avg_inflight_diff_count  0.0
zk_min_inflight_diff_count  0
zk_max_inflight_diff_count  0
zk_cnt_inflight_diff_count  0
zk_sum_inflight_diff_count  0
zk_avg_netty_queued_buffer_capacity 0.0
zk_min_netty_queued_buffer_capacity 0
zk_max_netty_queued_buffer_capacity 0
zk_cnt_netty_queued_buffer_capacity 0
zk_sum_netty_queued_buffer_capacity 0
zk_avg_election_time    0.0
zk_min_election_time    0
zk_max_election_time    0
zk_cnt_election_time    0
zk_sum_election_time    0
zk_avg_commit_commit_proc_req_queued    0.0
zk_min_commit_commit_proc_req_queued    0
zk_max_commit_commit_proc_req_queued    0
zk_cnt_commit_commit_proc_req_queued    0
zk_sum_commit_commit_proc_req_queued    0
zk_avg_sync_processor_batch_size    0.0
zk_min_sync_processor_batch_size    0
zk_max_sync_processor_batch_size    0
zk_cnt_sync_processor_batch_size    0
zk_sum_sync_processor_batch_size    0
zk_avg_node_children_watch_count    0.0
zk_min_node_children_watch_count    0
zk_max_node_children_watch_count    0
zk_cnt_node_children_watch_count    0
zk_sum_node_children_watch_count    0
zk_avg_write_batch_time_in_commit_processor 0.0
zk_min_write_batch_time_in_commit_processor 0
zk_max_write_batch_time_in_commit_processor 0
zk_cnt_write_batch_time_in_commit_processor 0
zk_sum_write_batch_time_in_commit_processor 0
zk_avg_read_commit_proc_issued  0.0
zk_min_read_commit_proc_issued  0
zk_max_read_commit_proc_issued  0
zk_cnt_read_commit_proc_issued  0
zk_sum_read_commit_proc_issued  0
zk_avg_concurrent_request_processing_in_commit_processor    0.0
zk_min_concurrent_request_processing_in_commit_processor    0
zk_max_concurrent_request_processing_in_commit_processor    0
zk_cnt_concurrent_request_processing_in_commit_processor    0
zk_sum_concurrent_request_processing_in_commit_processor    0
zk_avg_observer_sync_time   0.0
zk_min_observer_sync_time   0
zk_max_observer_sync_time   0
zk_cnt_observer_sync_time   0
zk_sum_observer_sync_time   0
zk_avg_node_changed_watch_count 0.0
zk_min_node_changed_watch_count 0
zk_max_node_changed_watch_count 0
zk_cnt_node_changed_watch_count 0
zk_sum_node_changed_watch_count 0
zk_avg_sync_process_time    0.0
zk_min_sync_process_time    0
zk_max_sync_process_time    0
zk_cnt_sync_process_time    0
zk_sum_sync_process_time    0
zk_avg_startup_snap_load_time   1.0
zk_min_startup_snap_load_time   1
zk_max_startup_snap_load_time   1
zk_cnt_startup_snap_load_time   1
zk_sum_startup_snap_load_time   1
zk_avg_prep_processor_queue_time_ms 0.0
zk_min_prep_processor_queue_time_ms 0
zk_max_prep_processor_queue_time_ms 0
zk_cnt_prep_processor_queue_time_ms 0
zk_sum_prep_processor_queue_time_ms 0
zk_p50_prep_processor_queue_time_ms 0
zk_p95_prep_processor_queue_time_ms 0
zk_p99_prep_processor_queue_time_ms 0
zk_p999_prep_processor_queue_time_ms    0
zk_avg_jvm_pause_time_ms    0.0
zk_min_jvm_pause_time_ms    0
zk_max_jvm_pause_time_ms    0
zk_cnt_jvm_pause_time_ms    0
zk_sum_jvm_pause_time_ms    0
zk_p50_jvm_pause_time_ms    0
zk_p95_jvm_pause_time_ms    0
zk_p99_jvm_pause_time_ms    0
zk_p999_jvm_pause_time_ms   0
zk_avg_close_session_prep_time  0.0
zk_min_close_session_prep_time  0
zk_max_close_session_prep_time  0
zk_cnt_close_session_prep_time  0
zk_sum_close_session_prep_time  0
zk_p50_close_session_prep_time  0
zk_p95_close_session_prep_time  0
zk_p99_close_session_prep_time  0
zk_p999_close_session_prep_time 0
zk_avg_read_commitproc_time_ms  0.0
zk_min_read_commitproc_time_ms  0
zk_max_read_commitproc_time_ms  0
zk_cnt_read_commitproc_time_ms  0
zk_sum_read_commitproc_time_ms  0
zk_p50_read_commitproc_time_ms  0
zk_p95_read_commitproc_time_ms  0
zk_p99_read_commitproc_time_ms  0
zk_p999_read_commitproc_time_ms 0
zk_avg_updatelatency    0.0
zk_min_updatelatency    0
zk_max_updatelatency    0
zk_cnt_updatelatency    0
zk_sum_updatelatency    0
zk_p50_updatelatency    0
zk_p95_updatelatency    0
zk_p99_updatelatency    0
zk_p999_updatelatency   0
zk_avg_local_write_committed_time_ms    0.0
zk_min_local_write_committed_time_ms    0
zk_max_local_write_committed_time_ms    0
zk_cnt_local_write_committed_time_ms    0
zk_sum_local_write_committed_time_ms    0
zk_p50_local_write_committed_time_ms    0
zk_p95_local_write_committed_time_ms    0
zk_p99_local_write_committed_time_ms    0
zk_p999_local_write_committed_time_ms   0
zk_avg_request_throttle_queue_time_ms   0.0
zk_min_request_throttle_queue_time_ms   0
zk_max_request_throttle_queue_time_ms   0
zk_cnt_request_throttle_queue_time_ms   0
zk_sum_request_throttle_queue_time_ms   0
zk_p50_request_throttle_queue_time_ms   0
zk_p95_request_throttle_queue_time_ms   0
zk_p99_request_throttle_queue_time_ms   0
zk_p999_request_throttle_queue_time_ms  0
zk_avg_readlatency  0.0
zk_min_readlatency  0
zk_max_readlatency  0
zk_cnt_readlatency  0
zk_sum_readlatency  0
zk_p50_readlatency  0
zk_p95_readlatency  0
zk_p99_readlatency  0
zk_p999_readlatency 0
zk_avg_quorum_ack_latency   0.0
zk_min_quorum_ack_latency   0
zk_max_quorum_ack_latency   0
zk_cnt_quorum_ack_latency   0
zk_sum_quorum_ack_latency   0
zk_p50_quorum_ack_latency   0
zk_p95_quorum_ack_latency   0
zk_p99_quorum_ack_latency   0
zk_p999_quorum_ack_latency  0
zk_avg_om_commit_process_time_ms    0.0
zk_min_om_commit_process_time_ms    0
zk_max_om_commit_process_time_ms    0
zk_cnt_om_commit_process_time_ms    0
zk_sum_om_commit_process_time_ms    0
zk_p50_om_commit_process_time_ms    0
zk_p95_om_commit_process_time_ms    0
zk_p99_om_commit_process_time_ms    0
zk_p999_om_commit_process_time_ms   0
zk_avg_read_final_proc_time_ms  0.0
zk_min_read_final_proc_time_ms  0
zk_max_read_final_proc_time_ms  0
zk_cnt_read_final_proc_time_ms  0
zk_sum_read_final_proc_time_ms  0
zk_p50_read_final_proc_time_ms  0
zk_p95_read_final_proc_time_ms  0
zk_p99_read_final_proc_time_ms  0
zk_p999_read_final_proc_time_ms 0
zk_avg_commit_propagation_latency   0.0
zk_min_commit_propagation_latency   0
zk_max_commit_propagation_latency   0
zk_cnt_commit_propagation_latency   0
zk_sum_commit_propagation_latency   0
zk_p50_commit_propagation_latency   0
zk_p95_commit_propagation_latency   0
zk_p99_commit_propagation_latency   0
zk_p999_commit_propagation_latency  0
zk_avg_dead_watchers_cleaner_latency    0.0
zk_min_dead_watchers_cleaner_latency    0
zk_max_dead_watchers_cleaner_latency    0
zk_cnt_dead_watchers_cleaner_latency    0
zk_sum_dead_watchers_cleaner_latency    0
zk_p50_dead_watchers_cleaner_latency    0
zk_p95_dead_watchers_cleaner_latency    0
zk_p99_dead_watchers_cleaner_latency    0
zk_p999_dead_watchers_cleaner_latency   0
zk_avg_write_final_proc_time_ms 0.0
zk_min_write_final_proc_time_ms 0
zk_max_write_final_proc_time_ms 0
zk_cnt_write_final_proc_time_ms 0
zk_sum_write_final_proc_time_ms 0
zk_p50_write_final_proc_time_ms 0
zk_p95_write_final_proc_time_ms 0
zk_p99_write_final_proc_time_ms 0
zk_p999_write_final_proc_time_ms    0
zk_avg_proposal_ack_creation_latency    0.0
zk_min_proposal_ack_creation_latency    0
zk_max_proposal_ack_creation_latency    0
zk_cnt_proposal_ack_creation_latency    0
zk_sum_proposal_ack_creation_latency    0
zk_p50_proposal_ack_creation_latency    0
zk_p95_proposal_ack_creation_latency    0
zk_p99_proposal_ack_creation_latency    0
zk_p999_proposal_ack_creation_latency   0
zk_avg_proposal_latency 0.0
zk_min_proposal_latency 0
zk_max_proposal_latency 0
zk_cnt_proposal_latency 0
zk_sum_proposal_latency 0
zk_p50_proposal_latency 0
zk_p95_proposal_latency 0
zk_p99_proposal_latency 0
zk_p999_proposal_latency    0
zk_avg_om_proposal_process_time_ms  0.0
zk_min_om_proposal_process_time_ms  0
zk_max_om_proposal_process_time_ms  0
zk_cnt_om_proposal_process_time_ms  0
zk_sum_om_proposal_process_time_ms  0
zk_p50_om_proposal_process_time_ms  0
zk_p95_om_proposal_process_time_ms  0
zk_p99_om_proposal_process_time_ms  0
zk_p999_om_proposal_process_time_ms 0
zk_avg_sync_processor_queue_and_flush_time_ms   0.0
zk_min_sync_processor_queue_and_flush_time_ms   0
zk_max_sync_processor_queue_and_flush_time_ms   0
zk_cnt_sync_processor_queue_and_flush_time_ms   0
zk_sum_sync_processor_queue_and_flush_time_ms   0
zk_p50_sync_processor_queue_and_flush_time_ms   0
zk_p95_sync_processor_queue_and_flush_time_ms   0
zk_p99_sync_processor_queue_and_flush_time_ms   0
zk_p999_sync_processor_queue_and_flush_time_ms  0
zk_avg_propagation_latency  0.0
zk_min_propagation_latency  0
zk_max_propagation_latency  0
zk_cnt_propagation_latency  0
zk_sum_propagation_latency  0
zk_p50_propagation_latency  0
zk_p95_propagation_latency  0
zk_p99_propagation_latency  0
zk_p999_propagation_latency 0
zk_avg_server_write_committed_time_ms   0.0
zk_min_server_write_committed_time_ms   0
zk_max_server_write_committed_time_ms   0
zk_cnt_server_write_committed_time_ms   0
zk_sum_server_write_committed_time_ms   0
zk_p50_server_write_committed_time_ms   0
zk_p95_server_write_committed_time_ms   0
zk_p99_server_write_committed_time_ms   0
zk_p999_server_write_committed_time_ms  0
zk_avg_sync_processor_queue_time_ms 0.0
zk_min_sync_processor_queue_time_ms 0
zk_max_sync_processor_queue_time_ms 0
zk_cnt_sync_processor_queue_time_ms 0
zk_sum_sync_processor_queue_time_ms 0
zk_p50_sync_processor_queue_time_ms 0
zk_p95_sync_processor_queue_time_ms 0
zk_p99_sync_processor_queue_time_ms 0
zk_p999_sync_processor_queue_time_ms    0
zk_avg_sync_processor_queue_flush_time_ms   0.0
zk_min_sync_processor_queue_flush_time_ms   0
zk_max_sync_processor_queue_flush_time_ms   0
zk_cnt_sync_processor_queue_flush_time_ms   0
zk_sum_sync_processor_queue_flush_time_ms   0
zk_p50_sync_processor_queue_flush_time_ms   0
zk_p95_sync_processor_queue_flush_time_ms   0
zk_p99_sync_processor_queue_flush_time_ms   0
zk_p999_sync_processor_queue_flush_time_ms  0
zk_avg_write_commitproc_time_ms 0.0
zk_min_write_commitproc_time_ms 0
zk_max_write_commitproc_time_ms 0
zk_cnt_write_commitproc_time_ms 0
zk_sum_write_commitproc_time_ms 0
zk_p50_write_commitproc_time_ms 0
zk_p95_write_commitproc_time_ms 0
zk_p99_write_commitproc_time_ms 0
zk_p999_write_commitproc_time_ms    0