ChainSafe / lodestar

🌟 TypeScript Implementation of Ethereum Consensus
https://lodestar.chainsafe.io
Apache License 2.0
1.18k stars 290 forks source link

metric beacon_finalized_epoch was updated slowly than other clients #5904

Open carameleon opened 1 year ago

carameleon commented 1 year ago

Describe the bug

Network : goerli

Lodestar metrics

beacon_head_slot 6355469
beacon_finalized_epoch 198604
...

other clients (teku, prysm, lighthouse)

beacon_head_slot 6355469
beacon_finalized_epoch 198605
...

I'm gathering finalized epochs and head slots to check if it is reaching inactivity leak state.

And only Lodestar reported to me beacon_finalized_epoch is delayed.

comparing lodestar log and metric :

Following chart is an increase trend of beacon_finalized_epoch. And brown line presents a Lodestar.

It was updated even 15mins later than other clients, though it reached same head slots.

image

image

It has happened frequently ( multiple times of everyday )

Expected behavior

beacon_finalized_epoch should be updated in time as other client do.

Steps to reproduce

No response

Additional context

Operating system

Linux

Lodestar version or commit hash

v1.10.0

nflaig commented 1 year ago

The issue seems to be related to how the beacon_finalized_epoch metric is updated

The problem is that the metric value is only updated when the block slot is the first slot of the epoch https://github.com/ChainSafe/lodestar/blob/6ac8c07023bc72c7ef54a6ea66884790793eda21/packages/beacon-node/src/chain/blocks/importBlock.ts#L334

but if that block is missed, the metric will not be updated at all, causing the value to deviate from other clients and to what the log reports

This can also be seen in the following logs, there are multiple skipped epochs (slot 0 block in epoch N+2 was missed)

Sep-05 15:04:06.483[chain]         verbose: Checkpoint finalized epoch=201608, rootHex=0x1e8946a93214282ca8f7fc308f67f1fb5b6f763ca55a52e0d46a374c4bf15418
Sep-05 15:23:13.284[chain]         verbose: Checkpoint finalized epoch=201610, rootHex=0xe0a251f29be4f4227969152f1c298315a46adf52c75b8b7b26abadccb92d7156
Sep-05 15:42:25.704[chain]         verbose: Checkpoint finalized epoch=201613, rootHex=0x8266693af6053374f4bd144f797c477e4a317c5297629094210044726e98e2c5
Sep-05 15:55:14.114[chain]         verbose: Checkpoint finalized epoch=201616, rootHex=0x5047823040b759272717c1ed1db731e99bf41f7951621b2b6c6b054facc03ed6

Another issue, we are also not emitting finalized checkpoint events for those epochs https://github.com/ChainSafe/lodestar/blob/6ac8c07023bc72c7ef54a6ea66884790793eda21/packages/beacon-node/src/chain/blocks/importBlock.ts#L363

As it works right now, the metric is unreliable, especially on goerli where a lot of blocks are missed. Alternatively, you could fetch the beacon API /eth/v1/beacon/states/head/finality_checkpoints (getStateFinalityCheckpoints) to get the finalized epoch. The API will provide accurate and consistent values.