Open thomas-nguy opened 11 months ago
can use 10.202.4.155
is the explorer meaning the L2 explorer in https://github.com/cronos-labs/zksync-era/issues/13 ?
pushed init prometheus server setup in https://github.com/cronos-labs/l2-testnets/commit/139d4e7caf019c08d3b590b569865cb5b2010b8a
yes it is the explorer in #13
thanks, could you also install grafana in this machine and configure it so that we can visualize the prometheus metrics? https://prometheus.io/docs/visualization/grafana/
sure
Seems like we already have a prometheus server and grafana set up internally
I'll switch the requirements to
Connect the zkserver to our prometheus server. By default it uses pull mode (https://github.com/cronos-labs/zksync-era/blob/internal/core/bin/zksync_core/src/lib.rs#L262) and the server is running on port 3312
Investigate what kind of metrics are collected and what metrics can help to fill those 3 basic requirements : 1 / Get alert when the server is down (can use the healthcheck running on port 3071) 2/ Get alert when the eth_sender wallet is almost empty 3/ Get alert when the circuit breaker is triggered (server is unable to run)
@calvinaco @ivanslwong-crypto-com @henrywong-crypto
fee_monitor_balances{account="fee_account_l1"} 0.5
fee_monitor_balances{account="operator_l1"} 1.7533511106980662
fee_monitor_balances{account="fee_account_l2"} 1.8844532495
operator_l1 gives you the balance for the eth_sender wallet
fee_monitor_balances{account="fee_account_l1"} 0.5 fee_monitor_balances{account="operator_l1"} 1.7533511106980662 fee_monitor_balances{account="fee_account_l2"} 1.8844532495
operator_l1 gives you the balance for the eth_sender wallet
@thomas-nguy @JayT106 What would be the criteria for low balance?
maybe below 0.5 to be conservative? for testnet 0.1 is fine
3/ Get alert when the circuit breaker is triggered (server is unable to run)
which metrics should I use to monitor " the circuit breaker is triggered"?
which metrics should I use to monitor " the circuit breaker is triggered"?
i checked but unfortunately there is no metrics.
I guess if circuit breaker is trigerred, the server will shut down and unable to restart due to exception
the healthcheck can be a good indicator that circuit breaker has been trigerred -> if server is down and see what it contains
I guess we can remove this requirement from the alert system
health check
@thomas-nguy I already implemented
They will send an alert to #blockchain-scaling-team
For the alert part it can be treated as done.