Cardinal-Cryptography / aleph-node-issues

Issue tracker for aleph-node related problems.
2 stars 0 forks source link

Node dropping connection #9

Closed eifos-git closed 7 months ago

eifos-git commented 1 year ago

Did you read the documentation and guides?

Is there an existing issue?

Description of the problem

As of today the following message is appearing in the log and our nodes are experiencing connectivity issues Received sync data of unsupported version Version(2), this node might be running outdated software.

At 10:38 connectivity dropped completely.

023-09-28 10:37:46  Idle (40 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 21.2kiB/s
2023-09-28 10:37:51 Clique Network status: not maintaining any connections;
2023-09-28 10:37:51 Sync stall detected, recreating the Forest.
2023-09-28 10:37:51 Network status report: authentication connected peers - 80; block sync connected peers - 79;
2023-09-28 10:37:51  Idle (29 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 21.1kiB/s
2023-09-28 10:37:56  Idle (25 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 13.2kiB/s
2023-09-28 10:38:01  Idle (2 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 9.2kiB/s
2023-09-28 10:38:06  Idle (0 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 0.5kiB/s
2023-09-28 10:38:11 Clique Network status: not maintaining any connections;
2023-09-28 10:38:11 Network status report: authentication connected peers - 0; block sync connected peers - 0;
2023-09-28 10:38:11  Idle (0 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 0

11 minutes later connectivity restored

2023-09-28 10:49:11  Idle (0 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 0
2023-09-28 10:49:16  Idle (0 peers), best: #59222288 (0xdb4c…2161), finalized #59222286 (0xf9cb…0bd9), ⬇ 0 ⬆ 0
2023-09-28 10:49:18  The bootnode you want to connect to at /ip4/5.255.69.121/tcp/30333/p2p/12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd provided a different peer ID 12D3KooWBMVUjxzgaFztJjsU1ToKV1GYS5aBPz73DevnneUZRgF8 than the one you expect 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd.
2023-09-28 10:49:18  The bootnode you want to connect to at /ip4/5.255.69.121/tcp/30333/p2p/12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd provided a different peer ID 12D3KooWBMVUjxzgaFztJjsU1ToKV1GYS5aBPz73DevnneUZRgF8 than the one you expect 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd.
2023-09-28 10:49:18  The bootnode you want to connect to at /dns4/bootnode-eu-central-1-0.azero.dev/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.
2023-09-28 10:49:18  The bootnode you want to connect to at /dns4/bootnode-eu-central-1-0.azero.dev/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.
2023-09-28 10:49:21 Running session 65803.
2023-09-28 10:49:21 ⚙️ Preparing 102.4 bps, target=#59222999 (18 peers), best: #59222800 (0x3f46…7a43), finalized #59222700 (0x479b…a2f8), ⬇ 452.2kiB/s ⬆ 151.1kiB/s
2023-09-28 10:49:23  The bootnode you want to connect to at /dns4/bootnode-eu-central-1-0.azero.dev/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.
2023-09-28 10:49:24 ✨ Imported #59223003 (0x9397…e769)
2023-09-28 10:49:24 Error marking block as imported: forest error: parent was not imported when attempting to import block.
2023-09-28 10:49:25 ✨ Imported #59223004 (0x687c…8572)
2023-09-28 10:49:25 Error marking block as imported: forest error: parent was not imported when attempting to import block.
2023-09-28 10:49:26 ✨ Imported #59223005 (0x5b36…db3e)
2023-09-28 10:49:26 Error marking block as imported: forest error: parent was not imported when attempting to import block.
2023-09-28 10:49:26  Idle (14 peers), best: #59223005 (0x5b36…db3e), finalized #59222890 (0x3d70…99e5), ⬇ 428.7kiB/s ⬆ 75.6kiB/s
2023-09-28 10:49:27 ✨ Imported #59223006 (0x26a6…7585)
2023-09-28 10:49:27 Error marking block as imported: forest error: parent was not imported when attempting to import block.
2023-09-28 10:49:28  The bootnode you want to connect to at /dns4/bootnode-eu-central-1-0.azero.dev/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWEF1Eo7uFZWdqFsTPP7CehpRt5NeXFwCe3157qpoU5aqd than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.

Nr of connected peers is fluctuating but not reaching the desired 40. Currently it sits at 16. Full log attached.

alephnode.log

Information on your setup.

1) Mainnet 2) Version 0.11.4 3) How do you run aleph-node - Directly 3) Is it a validator node or RPC-node? - Validator 4) what flags do you run aleph-node with? - Default'ish 5) operating system - Ubuntu 22.04.3 LTS 6) hardware - AMD 16/32 - 128GB - Nvme

Steps to reproduce

No response

Did you attach relevant logs?

kostekIV commented 1 year ago

Hi, there are 2 things here:

  1. Received sync data of unsupported version Version(2), this node might be running outdated software. -> Nothing to worry about, someone is running (mb even someone from our team) a new node version. The log msg here is far from ideal we work to improve this, sorry for the spam.
  2. Now the drop in the connections is something we need to look into. Thanks for catching this up :slightly_smiling_face:
vup2p commented 1 year ago

I am also experiencing a similar issue. It cannot synchronize; then it re-synchronizes the missing blocks, causing a sharp increase in network traffic. This issue occurred again on October 3, 2023, at 15:49:18 during session 66309. This is my node's log

{"log":"2023-10-03 15:49:18 Potential safety failure: reverting finalized block (59678610, 0xaac9863d221bda04c52d42ca8f7fb946d7a0100674a813268d9b4079078d1e92)    \n","stream":"stderr","time":"2023-10-03T15:49:18.626410503Z"}
{"log":"2023-10-03 15:49:18 Block import error: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:18.626430913Z"}
{"log":"2023-10-03 15:49:18 💔 Error importing block 0xbf2306c477af5089e24d89238903ea6e09eaba17d17d822880e4e40a24b4a227: consensus error: Import failed: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:18.626434603Z"}
{"log":"2023-10-03 15:49:18 💤 Idle (34 peers), best: #59678610 (0xaac9…1e92), finalized #59678610 (0xaac9…1e92), ⬇ 69.4kiB/s ⬆ 70.7kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:18.676440518Z"}
{"log":"2023-10-03 15:49:18 Potential safety failure: reverting finalized block (59678610, 0xaac9863d221bda04c52d42ca8f7fb946d7a0100674a813268d9b4079078d1e92)    \n","stream":"stderr","time":"2023-10-03T15:49:18.876711456Z"}
{"log":"2023-10-03 15:49:18 Block import error: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:18.876725626Z"}
{"log":"2023-10-03 15:49:18 💔 Error importing block 0xbf2306c477af5089e24d89238903ea6e09eaba17d17d822880e4e40a24b4a227: consensus error: Import failed: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:18.876730416Z"}
{"log":"2023-10-03 15:49:19 Potential safety failure: reverting finalized block (59678610, 0xaac9863d221bda04c52d42ca8f7fb946d7a0100674a813268d9b4079078d1e92)    \n","stream":"stderr","time":"2023-10-03T15:49:19.47742071Z"}
{"log":"2023-10-03 15:49:19 Block import error: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:19.477465579Z"}
{"log":"2023-10-03 15:49:19 💔 Error importing block 0xbf2306c477af5089e24d89238903ea6e09eaba17d17d822880e4e40a24b4a227: consensus error: Import failed: Potential long-range attack: block not in finalized chain.    \n","stream":"stderr","time":"2023-10-03T15:49:19.477471689Z"}
{"log":"2023-10-03 15:49:19 ✨ Imported #59678611 (0x4ffb…8bf7)    \n","stream":"stderr","time":"2023-10-03T15:49:19.777755637Z"}
{"log":"2023-10-03 15:49:20 ✨ Imported #59678611 (0xbf23…a227)    \n","stream":"stderr","time":"2023-10-03T15:49:20.244905008Z"}
{"log":"2023-10-03 15:49:20 Error marking block as imported: forest error: parent was in a state incompatible with importing this block.    \n","stream":"stderr","time":"2023-10-03T15:49:20.244935047Z"}

19Z"}
{"log":"2023-10-03 15:49:23 ✨ Imported #59678615 (0xb950…8542)    \n","stream":"stderr","time":"2023-10-03T15:49:23.632363921Z"}
{"log":"2023-10-03 15:49:23 💤 Idle (21 peers), best: #59678615 (0xb950…8542), finalized #59678614 (0x9ac7…0ec5), ⬇ 156.1kiB/s ⬆ 219.4kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:23.682385211Z"}
{"log":"2023-10-03 15:49:24 🙌 Starting consensus session on top of parent 0xb95064f01434cdd22094c273abb3d78c390ac6b0eae73fa799d08a4ec37c8542    \n","stream":"stderr","time":"2023-10-03T15:49:24.032852597Z"}
{"log":"2023-10-03 15:49:24 🎁 Prepared block for proposing at 59678616 (0 ms) [hash: 0x20bc009305452c4cebf88ea8fd9f11b2b75c828d444e1b820efa90554e5cb9b8; parent_hash: 0xb950…8542; extrinsics (1): [0x1dad…bb45]]    \n","stream":"stderr","time":"2023-10-03T15:49:24.032875106Z"}
{"log":"2023-10-03 15:49:24 🔖 Pre-sealed block for proposal at 59678616. Hash now 0xbd8000e2a2aac94986570933f5eeed1b789c0b6edea073b25af69c72001aab2c, previously 0x20bc009305452c4cebf88ea8fd9f11b2b75c828d444e1b820efa90554e5cb9b8.    \n","stream":"stderr","time":"2023-10-03T15:49:24.032879846Z"}
{"log":"2023-10-03 15:49:24 ✨ Imported #59678616 (0xbd80…ab2c)    \n","stream":"stderr","time":"2023-10-03T15:49:24.032883866Z"}
{"log":"2023-10-03 15:49:24 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:24.533671205Z"}
{"log":"2023-10-03 15:49:24 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:24.984396192Z"}
{"log":"2023-10-03 15:49:25 ✨ Imported #59678617 (0x1563…aff4)    \n","stream":"stderr","time":"2023-10-03T15:49:25.234751014Z"}
{"log":"2023-10-03 15:49:25 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:25.986781985Z"}
{"log":"2023-10-03 15:49:26 ✨ Imported #59678618 (0x16b3…ea42)    \n","stream":"stderr","time":"2023-10-03T15:49:26.637578038Z"}
{"log":"2023-10-03 15:49:27 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:27.239337813Z"}
{"log":"2023-10-03 15:49:27 Clique Network status: not maintaining any connections;     \n","stream":"stderr","time":"2023-10-03T15:49:27.590446544Z"}
{"log":"2023-10-03 15:49:27 Network status report: authentication connected peers - 54; block sync connected peers - 55;     \n","stream":"stderr","time":"2023-10-03T15:49:27.590473094Z"}
{"log":"2023-10-03 15:49:27 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:27.941066856Z"}
{"log":"2023-10-03 15:49:28 💤 Idle (25 peers), best: #59678618 (0x16b3…ea42), finalized #59678618 (0x16b3…ea42), ⬇ 1.0MiB/s ⬆ 318.9kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:28.69192124Z"}
{"log":"2023-10-03 15:49:33 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:33.39732976Z"}
{"log":"2023-10-03 15:49:33 💤 Idle (40 peers), best: #59678618 (0x16b3…ea42), finalized #59678618 (0x16b3…ea42), ⬇ 5.8MiB/s ⬆ 75.0kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:33.6976817Z"}
{"log":"2023-10-03 15:49:34 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:34.298408854Z"}
{"log":"2023-10-03 15:49:34 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:34.298437

{"log":"2023-10-03 15:49:41 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:41.807526362Z"}
{"log":"2023-10-03 15:49:43 💤 Idle (40 peers), best: #59678618 (0x16b3…ea42), finalized #59678618 (0x16b3…ea42), ⬇ 5.0MiB/s ⬆ 62.5kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:43.709892167Z"}
{"log":"2023-10-03 15:49:46 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:46.914027304Z"}
{"log":"2023-10-03 15:49:47 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:47.364504328Z"}
{"log":"2023-10-03 15:49:47 Clique Network status: not maintaining any connections;     \n","stream":"stderr","time":"2023-10-03T15:49:47.564745039Z"}
{"log":"2023-10-03 15:49:47 Network status report: authentication connected peers - 67; block sync connected peers - 67;     \n","stream":"stderr","time":"2023-10-03T15:49:47.564757199Z"}
{"log":"2023-10-03 15:49:47 Received sync data of unsupported version Version(2), this node might be running outdated software.    \n","stream":"stderr","time":"2023-10-03T15:49:47.664871574Z"}
{"log":"2023-10-03 15:49:48 💤 Idle (40 peers), best: #59678618 (0x16b3…ea42), finalized #59678618 (0x16b3…ea42), ⬇ 5.5MiB/s ⬆ 45.8kiB/s    \n","stream":"stderr","time":"2023-10-03T15:49:48.666114792Z"}
ggawryal commented 12 months ago

Hi @vup2p, thanks for reporting. I can see that there were some issues in block import at the beginning, but the node was able to sync and propose a new block on time. The question is why after that, for at least 22 seconds the node hasn't imported anything, while having all peers connected most of that time. Was the node able to import next block (#59678619 or higher) without restarting? If so, could you provide us with some later logs, up to the time of importing that block?

vup2p commented 12 months ago

Here is my new log at session 66425 issue_66425.log

This is LibP2P peers' information at that time image

ggawryal commented 12 months ago

@eifos-git since the number of connected peers in your node went to zero with zero inbound data, it might point that the network could have gone down. Do you have some evidence to believe this was not the case? All our nodes haven't encountered any significant drop in that time.

ggawryal commented 12 months ago

@vup2p both number of peers and network traffic is unstable, but this probably is not the primary issue and just a cause of something else. Could you check if cpu and memory usage was also abnormal then?

eifos-git commented 12 months ago

@eifos-git since the number of connected peers in your node went to zero with zero inbound data, it might point that the network could have gone down. Do you have some evidence to believe this was not the case? All our nodes haven't encountered any significant drop in that time.

Entirely possible but during that day connected peers fluctuated a lot, often dropping to low non-zero numbers while monitoring. This led me to believe it might not have been a network on/off situation. That being said, this only happened during that single day. Ever since peercount has been stable again.

vup2p commented 11 months ago

@vup2p both number of peers and network traffic is unstable, but this probably is not the primary issue and just a cause of something else. Could you check if cpu and memory usage was also abnormal then?

Thank @ggawryal At that time, my node's CPU and memory were still normal. Since then, my node have not seen any recurrence. We will continue to monitor the situation further.

ggawryal commented 10 months ago

HI, have you found any abnormal significant drops in connections since the last time? If not, then I'd close the issue.

vup2p commented 10 months ago

HI, have you found any abnormal significant drops in connections since the last time? If not, then I'd close the issue.

Sorry for late reply! We have not seen this issue again Thank you!