Open QNODE-CA opened 1 year ago
Hi, anything particular happened with network from your validator perspective around 2023-04-02 13:07:13? It seems that your node was not communicating other peers (no bytes sent/received), which fixed itself 15 minutes later:
2023-04-02 13:07:12 ✨ Imported #27490656 (0xb6f4…46e7)
2023-04-02 13:07:13 Clique Network status: not maintaining any connections;
2023-04-02 13:07:13 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:14 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 15.1kiB/s ⬆ 8.4kiB/s
2023-04-02 13:07:19 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:24 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.8kiB/s
2023-04-02 13:07:29 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.5kiB/s
2023-04-02 13:07:31 Justification requester status report: tries - 4; requested block number - 27490655; hash - 0x8004…9965;tries - 4; requested 1 children of finalized block 0x4bc1…6c85;
2023-04-02 13:07:33 Clique Network status: not maintaining any connections;
2023-04-02 13:07:33 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:34 💤 Idle (26 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:39 💤 Idle (25 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 1.9kiB/s
hmm looks like a brief network outage, let me have a look :)
ok, good find! Looks like I can reproduce this by disconnecting the network for a few seconds:
on the same node, nic disconnected at 2023-04-04 09:31:16
2023-04-04 09:31:14 💤 Idle (40 peers), best: #27649905 (0xa896…d21d), finalized #27649901 (0x62ad…198b), ⬇ 107.8kiB/s ⬆ 96.6kiB/s 2023-04-04 09:31:15 ✨ Imported #27649906 (0x6beb…df36) 2023-04-04 09:31:16 ✨ Imported #27649907 (0x5e30…9559) 2023-04-04 09:31:16 📪 No longer listening on /ip4/10.0.50.60/tcp/30334
nic reconnected at 2023-04-04 09:32:19
2023-04-04 09:32:14 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:19 Justification requester status report: tries - 15; requested block number - 27649906; hash - 0x6beb…df36;tries - 16; requested 1 children of finalized block 0x62ad…198b; 2023-04-04 09:32:19 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:21 💔 The bootnode you want to connect to at
/ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID
12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect
12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L. 2023-04-04 09:32:22 ✨ Imported #27649972 (0xdfba…c3ca)
still not sure why node keeps picking on poor 84.245.33.122
(at least it's consistent) and match it with a peer_id from the BOOT_NODES list
this happened again on a different testnet node (QNODE-TEST-2). This node in located in a DC, and has suffered no loss of connectivity, AFAICT. 40 peers solid throughout. Also, it self-corrected in about half a minute. log attached
azero-testnet-2 - 2023-04-06 112056 - odd bootnode IP-peer_id.zip
Hi, thanks for the reports. As per original logs, it does not seem to be issue affecting the chain. As per last log provided, There is one moment number of peers is 39, then it becomes 40 again, so I think there's some network activity done in the background. It's hard to tell at this moment where the issue is as it needs local reproduction to with some log levels set etc.
Hi, please let me know if this issue can be closed, any more reproduction so far?
Hi, I see this happening occasionally (every couple of weeks or so) on my mainnet node. Last instance was just a couple of days ago:
2023-06-25 03:58:12 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB
provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1
than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB
.
I did not see any of this on any of my testnet nodes, since upgrading to ver. 0.11.x
mainnet node logged another couple of these:
2023-07-08 03:12:34 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB
provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1
than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB
.
I can see no similar entries for testnet nodes. Hopefully this is affecting ver. 0.10.x only.
today my testnet-2 node also experienced the same issue:
2023-07-18 11:24:32 💔 The bootnode you want to connect to at /dns4/bootnode-us-east-2-0.test.azero.dev/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L
provided a different peer ID 12D3KooWD5s2dkifJua69RbLwEREDdJjsNHvavNRGxdCvzhoeaLc
than the one you expect 12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L
.
I guess ver. 0.11.3 is not immune from this after all.
Did you read the documentation and guides?
Is there an existing issue?
Description of the problem
my testnet node 1 (QNODE-TEST) started logging some events yesterday (Apr-2-2023), starting at 2023-04-02 13:13:07 PDT. Events continued until the node was restarted at 2023-04-02 14:06:08 PDT. a pair of the same event was logged every 5s or so:
2023-04-02 13:13:07 💔 The bootnode you want to connect to at
/ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID
12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect
12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L`I posted about this on the discord testnet-validator-general channel: https://discord.com/channels/775720634848116807/1050508965647233055/1092203956324814848
Another validator (Bart | StakeSafe) confirmed that the IP address logged (84.245.33.122) belonged to his ISP (Cambrium IT Services BV). All of the AZF bootnodes seem to be hosted on Amazon, so there was a mismatch of sort
the connection attempted was using one of my "regular" peers IP address along with a peer_id belonging to an actual AZF bootnode (the first on the bootnode list): bootnode-eu-central-1-0.test.azero.dev
bootnode-eu-central-1-0.test.azero.dev resolved to the correct IP address (18.184.79.200) on the same testnet machine the problem persisted until i restarted aleph-node. after restarting, the issue disappeared
Information on your setup.
testnet validator ✌️ version 0.10.1-9c6a46952d3 (built from source) node name: QNODE-TEST peer_id: 12D3KooWMW86EoNYwBar6e5Gcnmaa13N6JsnzYVPPVGsmvjUddWt OS: ubuntu 22.04.02 hardware: VM on ESXi hypervisor, AMD AMD EPYC 7302 (8c/16t dedicated), 64GB ECC RAM, 1×100 GB SSD, 1×1.92 TB SSD NVMe, 1G connectivity, business fibre (static IP) IP address: 206.116.52.219
Steps to reproduce
not sure how to reproduce
backup-aleph-node-testnet-2023-04-02T1406.zip
Did you attach relevant logs?