AleoNet / anf-snarkOS

A Decentralized Operating System for Zero-Knowledge Applications
http://snarkos.org
Apache License 2.0
4 stars 1 forks source link

[Bug] Client syncing issues #6

Open WietzeSlagman opened 2 months ago

WietzeSlagman commented 2 months ago

🐛 Bug Report

A client node running on canary occasionally stops syncing for a long while and does not get up to tip, it does not receive any new blocks from the validator it is connected too and only sees its own height as the tip.

Steps to Reproduce

Unclear on how to exaclty reproduce this, however the following logs get shown on the client where it does not receive any new blocks as it sees itself as the highest height.

2024-04-15T07:18:37.024879Z TRACE snarkos_node_sync::block_sync: Updating is_block_synced: greatest_peer_height = 0, canon_height = 34087
2024-04-15T07:18:37.024936Z TRACE snarkos_node_sync::block_sync: Prepared 0 block requests

A potential related element of this could be that it is refusing connections/disconnecting from the connected validator and unable to make that connection.

2024-04-15T07:18:02.042090Z  WARN snarkos_node_router: Unable to connect to '141.94.3.103:4130' - '141.94.3.103:4130' disconnected before sending "Message::ChallengeResponse"

Expected Behavior

Clients to keep syncing to the latest tip and keep in sync with their connected peers (validators and clients).

Your Environment

WietzeSlagman commented 2 months ago

Related logs for cases where clients stop syncing after a time, in this case 2 clients stopped at the same time. A restart does trigger them to start syncing again, however after less than 500 blocks they stop syncing again, requiring a new restart.

client2-19apr.log client1-19apr.log

asharma13524 commented 1 month ago

another case where clients have stopped syncing below with logs below:

This took place during the "900 clients connecting to the fringe clients, in a decentralized topology." test. Several clients appear to be quite a distance away from tip, potentially due to "maximum peers reached" but those peers are not synced to tip or some issue related to that. I have included logs of two of our clients that have gotten stopped up in the 260-280k block range on canary net.

Note: I only included 1 days' worth of logs but happy to include more if helpful.

client2-canary-may15.log client-canary-may15.log

snippet from where we initially stalled:

May 13 18:01:02 client-nodes-1 snarkos[3894356]: 2024-05-13T18:01:02.508529Z TRACE snarkos_node_sync::block_sync: No block requests to send - try advancing with block responses (at block 258945)
May 13 18:01:05 client-nodes-1 snarkos[3894356]: 2024-05-13T18:01:05.392179Z DEBUG snarkos_node_router::heartbeat: Connected to 21 peers [34.71.154.32:4136, 34.16.96.117:4134, 34.171.188.136:4138, 35.196.17.175:4130, 34.133.96.97:4131, 34.30.43.173:4134, 34.134.47.103:4139, 104.155.187.104:4132, 34.121.237.207:4130, 34.28.213.8:4139, 35.202.22.233:4136, 35.184.224.202:4136, 35.192.210.26:4139, 34.27.166.56:4135, 35.223.204.80:4136, 34.28.213.8:4131, 34.134.226.163:4130, 35.226.159.181:4130, 34.67.141.212:4130, 34.133.96.97:4136, 209.97.156.21:4130]
May 13 18:01:05 client-nodes-1 snarkos[3894356]: 2024-05-13T18:01:05.392269Z  INFO snarkos_node_router::heartbeat: Disconnecting from '209.97.156.21:4130' (periodic refresh of peers)
May 13 18:01:05 client-nodes-1 snarkos[3894356]: 2024-05-13T18:01:05.392317Z  WARN snarkos_node_router: Dropping connection attempt to '34.74.95.84:4130' (maximum peers reached)

Environment:

Running snarkOS commit: [https://github.com/AleoNet/anf-snarkOS/commit/fc340c679960e63612c536d69e71405b77e113f4] rustc 1.77.2 (25ef9e3d8 2024-04-09) Ubuntu 22.04.4 LTS