HariSekhon / Nagios-Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
https://www.linkedin.com/in/HariSekhon
Other
1.13k stars 502 forks source link

check_zookeeper: accept prometheus scientific notation metrics #388

Closed hdhoang closed 7 months ago

hdhoang commented 1 year ago

cf #387

For the state file, I have no solution yet. Copying the scientific regex to the multiline regex subparts doesn't match correctly, and I'm not experienced with perl.

Regarding reproduction, I'll try to make something self-contained without bitnami and/or helm

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

hdhoang commented 1 year ago

To start a throw away ZK container

‽$; podman run --rm --network host -it -p 2181 -e ZOO_4LW_COMMANDS_WHITELIST=mntr,srvr,ruok,isro,stats,wchs -e ZOO_ENABLE_PROMETHEUS_METRICS=yes -e ALLOW_ANONYMOUS_LOGIN=yes docker.io/bitnami/zookeeper:3.7.1-debian-11-r22

relevant start up log (but we don't touch port 7000 at all)

2022-07-21 10:47:30,416 [myid:] - INFO  [main:QuorumPeerConfig@481] - metricsProvider.className is org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
2022-07-21 10:47:30,419 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2022-07-21 10:47:30,419 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2022-07-21 10:47:30,419 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2022-07-21 10:47:30,420 [myid:1] - WARN  [main:QuorumPeerMain@139] - Either no config or no quorum defined in config, running in standalone mode
2022-07-21 10:47:30,421 [myid:1] - INFO  [main:ManagedUtil@46] - Log4j 1.2 jmx support not found; jmx disabled.
2022-07-21 10:47:30,422 [myid:1] - INFO  [main:QuorumPeerConfig@174] - Reading configuration from: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
2022-07-21 10:47:30,422 [myid:1] - INFO  [main:QuorumPeerConfig@444] - clientPortAddress is 0.0.0.0:2181
2022-07-21 10:47:30,422 [myid:1] - INFO  [main:QuorumPeerConfig@448] - secureClientPort is not set
2022-07-21 10:47:30,423 [myid:1] - INFO  [main:QuorumPeerConfig@464] - observerMasterPort is not set
2022-07-21 10:47:30,423 [myid:1] - INFO  [main:QuorumPeerConfig@481] - metricsProvider.className is org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
2022-07-21 10:47:30,423 [myid:1] - INFO  [main:ZooKeeperServerMain@123] - Starting server
2022-07-21 10:47:30,443 [myid:1] - INFO  [main:PrometheusMetricsProvider@74] - Initializing metrics, configuration: {exportJvmInfo=true, httpPort=7000}
2022-07-21 10:47:30,443 [myid:1] - INFO  [main:PrometheusMetricsProvider@82] - Starting /metrics HTTP endpoint at port 7000 exportJvmInfo: true

Running check with verbose:

‽$; ./check_zookeeper.pl -vvv -H 127.0.0.1 -P 2181
....snip....

mntr zk_approximate_data_size = 44.0
mntr zk_avg_latency = 0.0
mntr zk_ephemerals_count = 0.0
mntr zk_max_file_descriptor_count = 524288.0
mntr zk_max_latency = 0.0
mntr zk_min_latency = 0.0
mntr zk_open_file_descriptor_count = 73.0
mntr zk_outstanding_requests = 0.0
mntr zk_packets_received = 17.0
mntr zk_packets_sent = 20.0
mntr zk_server_state = standalone
mntr zk_version = 3.7.1-a2fb57c55f8e59cdd76c34b357ad5181df1258d5, built on 2022-05-07 06:45 UTC
mntr zk_watch_count = 0.0
mntr zk_znode_count = 5.0

opening state file '/tmp/check_zookeeper.pl.127.0.0.1.2181.state'

last line of state file: <1658400620.88033 0.0 12.0 13.0 >

state file contents didn't match expected format

last timestamp was not found in state file (invalid format)
'zk_outstanding_requests' stat was not found in state file (invalid format)
'zk_packets_received' stat was not found in state file (invalid format)
'zk_packets_sent' stat was not found in state file (invalid format)
missing or incorrect stats in state file, resetting to current values

checking thresholds
checking outstanding requests thresholds
WARNING: ZooKeeper Mode STANDALONE, avg latency 0.0, outstanding requests 0.0 (w=0/c=10), version 3.7.1-a2fb57c55f8e59cdd76c34b357ad5181df1258d5, built on 2022-05-07 06:45 UTC (missing or incorrect state file stats, should have been reset now and available from next run)

Note that all values are float in this format. There are no other traffic, so sent/received packets count increase by 4 each time. After some millions packet it'll start using sci notation. There are NaN values in mntr output summaries as well:

zk_inflight_diff_count_sum      0.0
zk_commit_propagation_latency{quantile="0.5"}   NaN
zk_commit_propagation_latency{quantile="0.9"}   NaN
zk_commit_propagation_latency{quantile="0.99"}  NaN
zk_commit_propagation_latency_count     0.0
zk_commit_propagation_latency_sum       0.0
zk_dead_watchers_cleared        0.0
zk_process_cpu_seconds_total    2.15
zk_process_start_time_seconds   1.658400449939E9
zk_process_open_fds     73.0
zk_process_max_fds      524288.0
zk_process_virtual_memory_bytes 5.909905408E9
zk_process_resident_memory_bytes        9.9774464E7
zk_ensemble_auth_success        0.0
zk_node_created_watch_count{quantile="0.5"}     NaN
zk_node_created_watch_count_count       0.0
zk_node_created_watch_count_sum 0.0

cheers