cronos-labs / cronos-zkevm

Apache License 2.0
5 stars 2 forks source link

[Testnet] Setup prometheus server #18

Open thomas-nguy opened 11 months ago

thomas-nguy commented 11 months ago
thomas-nguy commented 11 months ago

can use 10.202.4.155

JayT106 commented 11 months ago

is the explorer meaning the L2 explorer in https://github.com/cronos-labs/zksync-era/issues/13 ?

JayT106 commented 11 months ago

pushed init prometheus server setup in https://github.com/cronos-labs/l2-testnets/commit/139d4e7caf019c08d3b590b569865cb5b2010b8a

thomas-nguy commented 11 months ago

yes it is the explorer in #13

thomas-nguy commented 11 months ago

thanks, could you also install grafana in this machine and configure it so that we can visualize the prometheus metrics? https://prometheus.io/docs/visualization/grafana/

JayT106 commented 11 months ago

sure

thomas-nguy commented 11 months ago

Seems like we already have a prometheus server and grafana set up internally

I'll switch the requirements to

thomas-nguy commented 11 months ago
fee_monitor_balances{account="fee_account_l1"} 0.5
fee_monitor_balances{account="operator_l1"} 1.7533511106980662
fee_monitor_balances{account="fee_account_l2"} 1.8844532495

operator_l1 gives you the balance for the eth_sender wallet

calvinaco commented 11 months ago
fee_monitor_balances{account="fee_account_l1"} 0.5
fee_monitor_balances{account="operator_l1"} 1.7533511106980662
fee_monitor_balances{account="fee_account_l2"} 1.8844532495

operator_l1 gives you the balance for the eth_sender wallet

@thomas-nguy @JayT106 What would be the criteria for low balance?

thomas-nguy commented 11 months ago

maybe below 0.5 to be conservative? for testnet 0.1 is fine

calvinaco commented 11 months ago

3/ Get alert when the circuit breaker is triggered (server is unable to run)

which metrics should I use to monitor " the circuit breaker is triggered"?

thomas-nguy commented 11 months ago
which metrics should I use to monitor " the circuit breaker is triggered"?

i checked but unfortunately there is no metrics.

I guess if circuit breaker is trigerred, the server will shut down and unable to restart due to exception

the healthcheck can be a good indicator that circuit breaker has been trigerred -> if server is down and see what it contains

I guess we can remove this requirement from the alert system

thomas-nguy commented 11 months ago

health check

http://10.202.3.175:3071/health

calvinaco commented 11 months ago

@thomas-nguy I already implemented

They will send an alert to #blockchain-scaling-team

For the alert part it can be treated as done.