bosagora / agora

POC Node implementation for CoinNet
https://bosagora.io
MIT License
37 stars 22 forks source link

Could not perform catchup yet because we have no peer #3177

Closed linked0 closed 2 years ago

linked0 commented 2 years ago

A validator can not catch up and find any peers with binaries built with sources up to the last commit 2983454.

The logs are as follows.

2022-03-10 07:06:27,367 Info [agora.consensus.state.Ledger] - Last known block: #0 (0x8365f069fe37ee02f2c4dc6ad816702088fab5fc875c3c67b01f82c285aa2d90b605f57e068139eba1f20ce20578d712f75be4d8568c8f3a7a34604e72aa3175)
2022-03-10 07:06:27,382 Info [agora.consensus.state.Ledger] - Beginning externalization of block #0 (previous block signatures: )
2022-03-10 07:06:27,383 Info [agora.consensus.state.Ledger] - Transactions: 3 - Enrollments: 6
2022-03-10 07:06:27,383 Info [agora.consensus.state.Ledger] - Validators: Active: 0 - Signing:  - Slashed: 0
2022-03-10 07:06:27,393 Info [agora.consensus.state.Ledger] - Completed externalization of block #0
2022-03-10 07:06:27,406 Info [agora.node.Registry] - Registry is caching (non-authoritative) DNS server for zone 'testnet.bosagora.io.'
2022-03-10 07:06:27,418 Info [agora.node.Registry] - Registry is caching (non-authoritative) DNS server for zone 'validators.testnet.bosagora.io.'
2022-03-10 07:06:27,429 Info [agora.node.Registry] - Registry is caching (non-authoritative) DNS server for zone 'flash.testnet.bosagora.io.'
2022-03-10 07:06:27,446 Info [SCP] - LocalNode::LocalNode@18446744073709551615 qSet: 09cd3c
2022-03-10 07:06:27,447 Info [boa1xq75adlntrx78qt8fzd4d43hy4xnfwsm5l0cny8yukpw2x02tmahwke7chw] - Node identity initialized to 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
2022-03-10 07:06:27,447 Info [agora.consensus.protocol.Nominator] - Stopping nominating timer.
2022-03-10 07:06:28,284 Info [agora.network.Manager] - Net time offsets: [{ key: boa1xq75adlntrx78qt8fzd4d43hy4xnfwsm5l0cny8yukpw2x02tmahwke7chw, node_time: 1646895988, req_delay: 0, offset: 0 hnsecs }]
2022-03-10 07:06:28,288 Info [agora.node.Runner] - Node is listening on interface: http://0.0.0.0:8080
[main(+9B+) INF] Listening for requests on http://0.0.0.0:8080/
2022-03-10 07:06:28,288 Info [agora.node.Runner] - Node is listening on interface: http://:0
[main(+9B+) INF] Listening for requests on http://[::]:44423/
[main(+9B+) INF] Listening for requests on http://0.0.0.0:37411/
2022-03-10 07:06:28,288 Info [agora.node.Runner] - Node will be listening on TCP interface: 0.0.0.0:2826
2022-03-10 07:06:28,547 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://eu-002.bosagora.io/] for public key: boa1xzval2a3cdxv28n6slr62wlczslk3juvk7cu05qt3z55ty2rlfqfc6egsh2
2022-03-10 07:06:29,291 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://na-002.bosagora.io/] for public key: boa1xrval6hd8szdektyz69fnqjwqfejhu4rvrpwlahh9rhaazzpvs5g6lh34l5
2022-03-10 07:06:29,304 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://na-002.bosagora.io:3826/] for public key: boa1xrval7gwhjz4k9raqukcnv2n4rl4fxt74m2y9eay6l5mqdf4gntnzhhscrh
2022-03-10 07:06:29,317 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://na-001.bosagora.io/] for public key: boa1xzval4nvru2ej9m0rptq7hatukkavemryvct4f8smyy3ky9ct5u0s8w6gfy
2022-03-10 07:06:29,341 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://eu-002.bosagora.io:3826/] for public key: boa1xzval3ah8z7ewhuzx6mywveyr79f24w49rdypwgurhjkr8z2ke2mycftv9n
2022-03-10 07:06:29,354 Info [agora.node.Registry] - Registering addresses TYPE.CNAME: [agora://na-001.bosagora.io:3826/] for public key: boa1xrval5rzmma29zh4aqgv3mvcarhwa0w8rgthy3l9vaj3fywf9894ycmjkm8
2022-03-10 07:06:31,293 Info [agora.common.BanManager] - BanManager: Address agora://13.124.201.231/ banned at unixtime:1646895991 (2022-Mar-10 07:06:31Z) until unixtime:1646982391 (2022-Mar-11 07:06:31Z)
2022-03-10 07:06:48,288 Warn [boa1xq75adlntrx78qt8fzd4d43hy4xnfwsm5l0cny8yukpw2x02tmahwke7chw] - Could not perform catchup yet because we have no peer
2022-03-10 07:07:08,289 Warn [boa1xq75adlntrx78qt8fzd4d43hy4xnfwsm5l0cny8yukpw2x02tmahwke7chw] - Could not perform catchup yet because we have no peer
hewison-chris commented 2 years ago

Do you see Found new Validator later in the log? The warning should then stop each time it tries to catchup.

linked0 commented 2 years ago

I don't see any about Found new Validator but about banning some servers as follows. It outputs Could not perform catchup yet because we have no peer and Pending connections: [...] after that.

2022-03-10 07:09:30,332 Info [agora.common.BanManager] - BanManager: Address agora://eu-002.bosagora.io/ banned at unixtime:1646896170 (2022-Mar-10 07:09:30Z) until unixtime:1646982570 (2022-Mar-11 07:09:30Z)
2022-03-10 07:09:30,929 Info [agora.common.BanManager] - BanManager: Address agora://eu-002.bosagora.io:3826/ banned at unixtime:1646896170 (2022-Mar-10 07:09:30Z) until unixtime:1646982570 (2022-Mar-11 07:09:30Z)
2022-03-10 07:09:32,971 Info [agora.common.BanManager] - BanManager: Address agora://na-002.bosagora.io:3826/ banned at unixtime:1646896172 (2022-Mar-10 07:09:32Z) until unixtime:1646982572 (2022-Mar-11 07:09:32Z)
2022-03-10 07:09:33,95 Info [agora.common.BanManager] - BanManager: Address agora://na-002.bosagora.io/ banned at unixtime:1646896173 (2022-Mar-10 07:09:33Z) until unixtime:1646982573 (2022-Mar-11 07:09:33Z)
2022-03-10 07:09:35,455 Info [agora.common.BanManager] - BanManager: Address agora://na-001.bosagora.io:3826/ banned at unixtime:1646896175 (2022-Mar-10 07:09:35Z) until unixtime:1646982575 (2022-Mar-11 07:09:35Z)
2022-03-10 07:09:36,776 Info [agora.common.BanManager] - BanManager: Address agora://na-001.bosagora.io/ banned at unixtime:1646896176 (2022-Mar-10 07:09:36Z) until unixtime:1646982576 (2022-Mar-11 07:09:36Z)
Geod24 commented 2 years ago

Those are genesis validators, I am surprised they would get banned. And dig shows they are still in the zone.

linked0 commented 2 years ago

Do you see Found new Validator later in the log? The warning should then stop each time it tries to catchup.

This issue and issue #3162 could be related to the catch-up timer based on what you're saying, I think.

hewison-chris commented 2 years ago

What config.yaml do you have? Can you check you do not have test_validators: 1?

linked0 commented 2 years ago

No, it doesn't have the test_validators item. My config is as follows.

node:
  realm: "testnet.bosagora.io"
  testing: true
  registry_address: https://ns1.bosagora.io/

registry:
  public: true
  port: 8053 

#consensus:
#  block_interval:
#    seconds: 60
#  validator_cycle: 20
#  genesis_timestamp: 1645092000

logging:
  root:
    # You might want to use `Trace` and separate output
    level: Info
    console: true
    propagate: true
    file: log/root.log

validator:
  enabled: true
  seed: SA37MQB4F4OSE3DF23SCAS4U5DGWTYVYILAQ53MLKAEAA6PNJMDA57HH 
  addresses_to_register:
    - "agora://13.124.201.231:2826"
linked0 commented 2 years ago

Log messages at Trace level.

2022-03-11 01:22:19,287 Trace [agora.network.RPC..0] - [CLIENT] agora.network.RPC.RPCClient!(API).RPCClient.getNodeInfo: Returning { state: NetworkState.Incomplete, peers: [] }
2022-03-11 01:22:19,287 Trace [agora.network.Client] - Client.attemptRequest 'getNodeInfo' to :: 1/50 SUCCESS
2022-03-11 01:22:21,278 Trace [agora.network.Manager] - Doing periodic network discovery: 0 required peers requested, 0 missing, known 6
2022-03-11 01:22:21,878 Trace [agora.network.RPC] - Exception caught in handle: object.Exception@submodules/vibe-core/source/vibe/core/net.d(731): Reached end of stream while reading data.
----------------
/home/ubuntu/dlang/ldc-1.28.1/bin/../import/std/exception.d:517 pure @safe noreturn std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, scope const(char)[]) [0x55bca8fb502b]
/home/ubuntu/dlang/ldc-1.28.1/bin/../import/std/exception.d:437 pure @safe bool std.exception.enforce!().enforce!(bool).enforce(bool, lazy const(char)[], immutable(char)[], ulong) [0x55bca8d11e55]
submodules/vibe-core/source/vibe/core/net.d:731 [0x55bca98b1637]
submodules/vibe-core/source/vibe/core/net.d:851 @safe void vibe.core.net.loopWithTimeout!(vibe.core.net.TCPConnection.read(scope ubyte[], eventcore.driver.IOMode).__lambda4, vibe.core.net.ReadTimeoutException).loopWithTimeout(core.time.Duration, immutable(char[])) [0x55bca98ae647]
submodules/vibe-core/source/vibe/core/net.d:727 @safe ulong vibe.core.net.TCPConnection.read(scope ubyte[], eventcore.driver.IOMode) [0x55bca98ae5c5]
submodules/vibe-core/source/vibe/core/net.d:743 @safe void vibe.core.net.TCPConnection.read(scope ubyte[]) [0x55bca98ae791]
source/agora/network/RPC.d:583 [0x55bca90ddcec]
submodules/serialization/source/agora/serialization/Serializer.d:603 [0x55bca913c674]
submodules/serialization/source/agora/serialization/Serializer.d:603 @safe ubyte[64] agora.serialization.Serializer.deserializeFull!(ubyte[64]).deserializeFull(scope const(ubyte)[] delegate(ulong) @safe, in agora.serialization.Serializer.DeserializerOptions) [0x55bca913c604]
submodules/serialization/source/agora/serialization/Serializer.d:674 @safe ubyte[64] agora.serialization.Serializer.deserializeFull!(geod24.bitblob.BitBlob!(64uL).BitBlob).deserializeFull(scope const(ubyte)[] delegate(ulong) @safe, in agora.serialization.Serializer.DeserializerOptions).convert!(ubyte[64]).convert() [0x55bca913c5c2]
submodules/serialization/source/agora/serialization/Serializer.d:676 @safe geod24.bitblob.BitBlob!(64uL).BitBlob agora.serialization.Serializer.deserializeFull!(geod24.bitblob.BitBlob!(64uL).BitBlob).deserializeFull(scope const(ubyte)[] delegate(ulong) @safe, in agora.serialization.Serializer.DeserializerOptions) [0x55bca90dddc1]
source/agora/network/RPC.d:593 @trusted void agora.network.RPC.handleThrow!(agora.api.Validator.API).handleThrow(scope agora.api.Validator.API, agora.network.RPC.RPCConnection, core.time.Duration) [0x55bca90d9893]
source/agora/network/RPC.d:545 nothrow @trusted bool agora.network.RPC.handle!(agora.api.Validator.API).handle(agora.api.Validator.API, agora.network.RPC.RPCConnection, core.time.Duration) [0x55bca90d96e4]
source/agora/network/RPC.d:462 nothrow @safe void agora.network.RPC.RPCConnection.startListening!(agora.api.Validator.API).startListening(agora.api.Validator.API) [0x55bca90d9688]
source/agora/network/RPC.d:232 [0x55bca90d9627]
submodules/vibe-core/source/vibe/core/task.d:721 nothrow void vibe.core.task.TaskFuncInfo.set!(void delegate() nothrow @safe).set(ref void delegate() nothrow @safe).callDelegate(ref vibe.core.task.TaskFuncInfo) [0x55bca925e87e]
submodules/vibe-core/source/vibe/core/task.d:739 void vibe.core.task.TaskFuncInfo.call() [0x55bca98d5a5b]
submodules/vibe-core/source/vibe/core/task.d:446 nothrow void vibe.core.task.TaskFiber.run() [0x55bca98d51b9]
??:? fiber_entryPoint [0x55bca9a8c1ba]
linked0 commented 2 years ago

The exception occurs at this line.

linked0 commented 2 years ago

Those are genesis validators, I am surprised they would get banned. And dig shows they are still in the zone.

I think we should not ban genesis validators or ban only validators even once connected.

linked0 commented 2 years ago

Closing as this doesn't happen again. Will reopen again if this happens again.

linked0 commented 2 years ago

Reopening this issue because the problem banning genesis validators must not happen if they are not slashed or expired. Will submit a PR solving this problem.