bosagora / agora

POC Node implementation for CoinNet
https://bosagora.io
MIT License
37 stars 22 forks source link

New validator can not nominate after catch-up completed #3193

Closed linked0 closed 2 years ago

linked0 commented 2 years ago

A new validator completed a catch-up for a current block like block #2458 which was externalized at UTC 2022-03-14 08:50:00 and then was continuing to call checkNominate with Too early to nominate ... as follows to the next block.

2022-03-14 08:57:19,248 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248239, next: 1647248400)
2022-03-14 08:57:45,249 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248265, next: 1647248400)
2022-03-14 08:58:11,250 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248291, next: 1647248400)
2022-03-14 08:58:37,251 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248317, next: 1647248400)
2022-03-14 08:59:03,252 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248343, next: 1647248400)
2022-03-14 08:59:29,253 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248369, next: 1647248400)
2022-03-14 08:59:55,254 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248395, next: 1647248400)

For example, UTCs for the last current and next are as follows.

This is the related source code. Nominator.d L512-L528

hewison-chris commented 2 years ago

This looks like the intended behaviour. Is your concern the volume of logging? Maybe we could reduce the noise in the log and only print at info the first instance and then at trace level until the next block.

linked0 commented 2 years ago

This looks like the intended behaviour. Is your concern the volume of logging? Maybe we could reduce the noise in the log and only print at info the first instance and then at trace level until the next block.

No. my concern is that the new validator only logs those messages and never participates in nomination. The more messages are as follows. This pattern of logs is repeated once the validator completes a catch-up.

2022-03-14 08:56:58,490 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted tx: Inputs (1): 0xd883...9279:0x6dcb...76bf
Outputs (5):
boa1xppz00cv...4cxq(1,242,448.1,632,699)<Payment>, boa1xpaqh00j...kwpq(1,242,448.1,632,699)<Payment>, boa1xpafy003...8hap(1,242,448.1,632,699)<Payment>,
boa1xplw00ml...xady(1,242,448.1,632,699)<Payment>, boa1xzvr00tk...s3ft(1,242,448.1,632,699)<Payment>, txHash: 0x93641f89f7e77bfc8002a3fdce43e836c6cd4bbc91d79dd85a763deabfa74260b352afe165f2558b5d885feaa9ca1a911fe6f2e054ebcb4ca3e57fda8893d899
2022-03-14 08:57:19,248 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248239, next: 1647248400)
2022-03-14 08:57:45,249 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248265, next: 1647248400)
2022-03-14 08:58:11,250 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248291, next: 1647248400)
2022-03-14 08:58:37,251 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248317, next: 1647248400)
2022-03-14 08:59:03,252 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248343, next: 1647248400)
2022-03-14 08:59:29,253 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248369, next: 1647248400)
2022-03-14 08:59:55,254 Info [agora.consensus.protocol.Nominator] - checkNominate(): Too early to nominate (current: 1647248395, next: 1647248400)
2022-03-14 09:00:00,823 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x1b1b...e050, hash: 0x11d2...ed7f, height: { value: 2465 } }
<More `Accepted preimage` messages OMITTED>
2022-03-14 09:00:20,930 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0xb503...1e56, hash: 0x6e7c...195a, height: { value: 2465 } }
2022-03-14 09:00:21,261 Info [agora.consensus.protocol.Nominator] - Nominating { tx_set: [0x9364...d899], enrolls: [], missing_validators: [] } at 1647248421
2022-03-14 09:00:21,261 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.Nominator.nominate(): Proposing tx set for slot 2459, ledger is at height 2458
2022-03-14 09:00:21,262 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.Nominator.nominate(): Tx set triggered new nomination
2022-03-14 09:00:21,758 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0xf7fa...0639, hash: 0xf4b3...1af0, height: { value: 2465 } }
2022-03-14 09:00:21,799 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0xaa7c...6832, hash: 0x1422...5733, height: { value: 2465 } }
2022-03-14 09:00:22,669 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x84da...3493, hash: 0xb447...5e7b, height: { value: 2465 } }
2022-03-14 09:00:23,265 Info [agora.consensus.protocol.Nominator] - Nominating { tx_set: [0x9364...d899], enrolls: [], missing_validators: [] } at 1647248423
2022-03-14 09:00:23,265 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.Nominator.nominate(): Proposing tx set for slot 2459, ledger is at height 2458
2022-03-14 09:00:23,266 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.Nominator.nominate(): Tx set didn't trigger new nomination
2022-03-14 09:00:23,266 Info [agora.consensus.protocol.Nominator] - agora.consensus.protocol.Nominator.Nominator.nominate(): Behind expected height so resend latest envleopes
2022-03-14 09:00:23,634 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x94b7...7177, hash: 0x0d1e...fe8b, height: { value: 2465 } }
2022-03-14 09:00:26,88 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x1996...41da, hash: 0x0d66...d7bb, height: { value: 2465 } }
2022-03-14 09:00:26,713 Info [agora.consensus.protocol.Nominator] - Balloting started for slot idx 2459
2022-03-14 09:00:27,822 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0xddf6...53e6, hash: 0x358c...3920, height: { value: 2465 } }
2022-03-14 09:00:29,268 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:00:31,186 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x76cc...0893, hash: 0xafa1...a0ef, height: { value: 2465 } }
<More `Accepted preimage` messages OMITTED>
2022-03-14 09:00:34,400 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x4b92...8753, hash: 0x4051...a09b, height: { value: 2465 } }
2022-03-14 09:00:35,946 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Ledger out of sync, missing 1 blocks (current height: 2458, delay: 10 minutes)
2022-03-14 09:00:36,505 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Accepted preimage: { utxo: 0x0d53...62ab, hash: 0xa776...8056, height: { value: 2465 } }
2022-03-14 09:00:37,269 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:00:47,271 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:00:59,273 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:01:17,274 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:01:37,276 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:01:53,252 Info [boa1xpdk34dyfv3dl2q24jw5k8ujh9mx8eml7flqfduhum6w9ypyywh4725ke6t] - Ledger out of sync, missing 1 blocks (current height: 2458, delay: 10 minutes)
2022-03-14 09:01:59,277 Info [agora.consensus.protocol.Nominator] - checkNominate(): Balloting already started for height 2459 skipping new nomination
2022-03-14 09:02:19,197 Info [agora.network.Manager] - Retrieving blocks [2459..2459] from [agora://eu-004.bosagora.io:4826/]..
2022-03-14 09:02:19,744 Info [agora.network.Manager] - Received blocks [2459..2459]
2022-03-14 09:02:19,751 Info [agora.consensus.state.Ledger] - Beginning externalization of block #2459 (previous block signatures: 1111111111111111111111011111111)
2022-03-14 09:02:19,751 Info [agora.consensus.state.Ledger] - Transactions: 1 - Enrollments: 0
2022-03-14 09:02:19,752 Info [agora.consensus.state.Ledger] - Validators: Active: 31 - Signing: 1111111111111101110111011111111 - Slashed: 0
2022-03-14 09:02:19,762 Info [agora.consensus.state.Ledger] - Completed externalization of block #2459
linked0 commented 2 years ago

I think the if statement in the code is always true because the getExpectedBlockTime returns the time when the next block will be externalized.

const cur_time = this.clock.networkTime();
const next_nomination = this.getExpectedBlockTime();
if (cur_time < next_nomination)
{
    this.log.trace(
        "checkNominate(): Too early to nominate (current: {}, next: {})",
        cur_time, next_nomination);
    return;
}
hewison-chris commented 2 years ago

^ But the current time is moving forwards until this condition is met. Or am I missing something?

linked0 commented 2 years ago

I mean the cur_time is always behind the time that getExpectedBlockTime returns after a new block is externalized because the function returns genesis_time + (ledger.height() + 1) * block_interval.

hewison-chris commented 2 years ago

PR https://github.com/bosagora/agora/pull/3200 might help it not always fetch the block at the beginning of the block nomination time.

Geod24 commented 2 years ago

Discussed in sprint planning:

linked0 commented 2 years ago

Shouldn't we have to change these lines?

const next_nomination = this.ledger.getExpectedBlockTime(slot_idx);
if (cur_time < next_nomination)

to pseudo code

const next_nomination = this.ledger.getExpectedBlockTime(this.ledger.height) + some_time;
if (cur_time < next_nomination)
hewison-chris commented 2 years ago

I think we need to follow the steps we defined ^^ As long as this.ledger.getExpectedBlockTime is what we want for next block then we can go ahead with second item and arm the timer for the difference between current time and the expected time to prevent extra timer events,

linked0 commented 2 years ago

OK, Sure. But for the second item, the main thing is not to spam the messages but why the messages happen to appear at the right time to nominate.

linked0 commented 2 years ago

@hewison-chris and I discussed this issue and he suggested the way we decrease the number of the logs. But orginally I focused on a nominator not nominating, not spamming messages when raising this issue. So I'm closing this because we could handle this kind of issue later when it becomes more important than now.