apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.1k stars 3.56k forks source link

Prometheus collect the metric fail #12617

Open Richard-Yi opened 2 years ago

Richard-Yi commented 2 years ago

Describe the bug Prometheus Version: 2.24.0 Pulsar Version: 2.8.1

Prometheus collect the metric of one of brokers fail while the metric interface returns normal metrics and the other broker is successfully collected.

I think there may be a problem with the content returned by the interface, causing prometheus fail to parse the metrics.

Did anyone encounter the same problem?

image

Part of the metric content of the broker fail to collect . (Do I need upload the whole metric content of the broker cause it's too long?)

# TYPE pulsar_broker_load_manager_bundle_assigment summary
pulsar_broker_load_manager_bundle_assigment{cluster="pulsar-cluster-zk-1",quantile="0.5"} NaN
pulsar_broker_load_manager_bundle_assigment{cluster="pulsar-cluster-zk-1",quantile="0.99"} NaN
pulsar_broker_load_manager_bundle_assigment{cluster="pulsar-cluster-zk-1",quantile="0.999"} NaN
pulsar_broker_load_manager_bundle_assigment{cluster="pulsar-cluster-zk-1",quantile="1.0"} -Inf
pulsar_broker_load_manager_bundle_assigment_count{cluster="pulsar-cluster-zk-1"} 15.0
pulsar_broker_load_manager_bundle_assigment_sum{cluster="pulsar-cluster-zk-1"} 139.0
# TYPE caffeine_cache_hit_total counter
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 12678.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 2762.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 289.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 15162.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 1.3430365E7
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 5880.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_hit_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_miss_total counter
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 5132.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 755.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 317.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 9.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 122.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 4176.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_miss_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_requests_total counter
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 17810.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 3517.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 606.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 15171.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 1.3430487E7
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 10056.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_requests_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_eviction_total counter
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 673.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 47.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_eviction_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_eviction_weight gauge
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 673.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 47.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="bundles"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_eviction_weight{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_load_failure_total counter
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 1.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 39.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 2077.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_load_failure_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_loads_total counter
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 14.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 755.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 317.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="bundles"} 9.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 237259.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 2964.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_loads_total{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_estimated_size gauge
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 13.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 1.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 1.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="bundles"} 9.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 8.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 11.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_estimated_size{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE caffeine_cache_load_duration_seconds summary
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 14.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="owned-bundles"} 0.066185552
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="bookies-racks-exists"} 0.0
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="global-zk-children"} 0.0
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="bookies-racks-children"} 0.0
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 755.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="local-zk-exists"} 0.888731904
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 317.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="local-zk-children"} 0.422091904
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="bundles"} 9.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="bundles"} 0.032046353
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 237259.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="global-zk-data"} 0.193615526
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 2964.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="local-zk-data"} 3.489291702
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="bookies-racks-data"} 0.0
caffeine_cache_load_duration_seconds_count{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
caffeine_cache_load_duration_seconds_sum{cluster="pulsar-cluster-zk-1",cache="global-zk-exists"} 0.0
# TYPE log4j2_appender_total counter
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="debug"} 0.0
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="warn"} 2185.0
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="trace"} 0.0
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="error"} 69.0
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="fatal"} 0.0
log4j2_appender_total{cluster="pulsar-cluster-zk-1",level="info"} 2478956.0
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded{cluster="pulsar-cluster-zk-1"} 12355.0
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total{cluster="pulsar-cluster-zk-1"} 12355.0
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total{cluster="pulsar-cluster-zk-1"} 0.0
# TYPE topic_load_times summary
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.5"} NaN
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.75"} NaN
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.95"} NaN
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.99"} NaN
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.999"} NaN
topic_load_times{cluster="pulsar-cluster-zk-1",quantile="0.9999"} NaN
topic_load_times_count{cluster="pulsar-cluster-zk-1"} 0.0
topic_load_times_sum{cluster="pulsar-cluster-zk-1"} 0.0
# TYPE pulsar_broker_lookup summary
pulsar_broker_lookup{cluster="pulsar-cluster-zk-1",quantile="0.5"} NaN
pulsar_broker_lookup{cluster="pulsar-cluster-zk-1",quantile="0.99"} NaN
pulsar_broker_lookup{cluster="pulsar-cluster-zk-1",quantile="0.999"} NaN
pulsar_broker_lookup{cluster="pulsar-cluster-zk-1",quantile="1.0"} -Inf
pulsar_broker_lookup_count{cluster="pulsar-cluster-zk-1"} 2869.0
pulsar_broker_lookup_sum{cluster="pulsar-cluster-zk-1"} 191.0
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{cluster="pulsar-cluster-zk-1",area="heap"} 4.34723952E8
jvm_memory_bytes_used{cluster="pulsar-cluster-zk-1",area="nonheap"} 1.41110336E8
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{cluster="pulsar-cluster-zk-1",area="heap"} 2.147483648E9
jvm_memory_bytes_committed{cluster="pulsar-cluster-zk-1",area="nonheap"} 1.4958592E8
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{cluster="pulsar-cluster-zk-1",area="heap"} 2.147483648E9
jvm_memory_bytes_max{cluster="pulsar-cluster-zk-1",area="nonheap"} -1.0
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{cluster="pulsar-cluster-zk-1",area="heap"} 2.147483648E9
jvm_memory_bytes_init{cluster="pulsar-cluster-zk-1",area="nonheap"} 2555904.0
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="Code Cache"} 5.8152128E7
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="Metaspace"} 7.440004E7
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="Compressed Class Space"} 8558168.0
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="G1 Eden Space"} 2.65289728E8
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="G1 Survivor Space"} 2097152.0
jvm_memory_pool_bytes_used{cluster="pulsar-cluster-zk-1",pool="G1 Old Gen"} 1.67337072E8
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="Code Cache"} 5.8720256E7
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="Metaspace"} 8.1084416E7
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="Compressed Class Space"} 9781248.0
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="G1 Eden Space"} 1.126170624E9
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="G1 Survivor Space"} 2097152.0
jvm_memory_pool_bytes_committed{cluster="pulsar-cluster-zk-1",pool="G1 Old Gen"} 1.019215872E9
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="Code Cache"} 2.5165824E8
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="Metaspace"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="Compressed Class Space"} 1.073741824E9
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="G1 Eden Space"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="G1 Survivor Space"} -1.0
jvm_memory_pool_bytes_max{cluster="pulsar-cluster-zk-1",pool="G1 Old Gen"} 2.147483648E9
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="Code Cache"} 2555904.0
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="Metaspace"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="Compressed Class Space"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="G1 Eden Space"} 1.128267776E9
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="G1 Survivor Space"} 0.0
jvm_memory_pool_bytes_init{cluster="pulsar-cluster-zk-1",pool="G1 Old Gen"} 1.019215872E9
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{cluster="pulsar-cluster-zk-1",gc="G1 Young Generation"} 451.0
jvm_gc_collection_seconds_sum{cluster="pulsar-cluster-zk-1",gc="G1 Young Generation"} 51.331
jvm_gc_collection_seconds_count{cluster="pulsar-cluster-zk-1",gc="G1 Old Generation"} 0.0
jvm_gc_collection_seconds_sum{cluster="pulsar-cluster-zk-1",gc="G1 Old Generation"} 0.0
codelipenghui commented 2 years ago

@Richard-Yi Did you see any error logs in the Prometheus server? Maybe there are some details about the errors.

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.