SChernykh / p2pool

Decentralized pool for Monero mining
GNU General Public License v3.0
1.03k stars 124 forks source link

p2pool seems to be causing monerod to fall out of sync and never catch up #209

Closed snex closed 1 year ago

snex commented 1 year ago

Not really sure what's causing this, but it happens occasionally after the 0.18 update. I will wake up and p2pool is complaining about monerod being out of sync. I shut down p2pool and then monerod will catch up no problem.

SChernykh commented 1 year ago
snex commented 1 year ago

v0.18.1.2-release, Linux, p2pool self compiled.

monerod command: monerod --config-file [path]/monerod.conf --non-interactive p2pool command: p2pool --host 127.0.0.1 --wallet [wallet] --start-mining 3 --mini

monerod.conf: out-peers=50 limit-rate-down=100000 zmq-pub=tcp://127.0.0.1:18083 disable-dns-checkpoints=true enable-dns-blocklist=true

I am using the internal miner because I have tested xmrig and get the same exact hashrate. I also checked through my logs and found that the problem began at exactly 2022-10-10 02:53:28.362 this morning. monerod just suddenly fell behind and kept losing time until I woke up today and turned p2pool off to enable it to catch back up.

SChernykh commented 1 year ago

I see two possibilities:

It can be miner taking too much CPU so monerod can't keep up, or p2pool + monerod didn't fit in RAM so monerod slowed down because of disk swapping. And if it's not on SSD, it would make this much worse.

Or it can be too many incoming connections (more than 1024) piling up over time, so monerod ran out of open files limit and all sorts of strange things started happening. You don't limit in-peers in monerod.conf

snex commented 1 year ago

My resources seem fine so it might be the 2nd one - I will limit in-peers and see if it happens again. Thanks.

snex commented 1 year ago

Ok so turns out I am actually behind a firewall and therefore my in-peers is always at 0. Might it be worthwhile to have a command line option that tells p2pool to skip the call to handle_incoming_block_async in p2p_server.cpp when the peer_height doesn't match our_height? I could probably write the code for that but I'm not sure how I would test it.

SChernykh commented 1 year ago

This logic already exists: https://github.com/SChernykh/p2pool/blob/master/src/p2p_server.cpp#L1892 Changing it to completely discard everything that doesn't match current height will quickly lead to p2pool chain split, so it's a no go. It's better to fix your monerod issues.

snex commented 1 year ago

Is there a valid reason to continue processing when the peer_height is greater than our_height by some significant amount? The function will early return false when we are unreasonably stale (our_height is more than 5 ahead of peer_height), but it never does an early return no matter how far behind we are.

SChernykh commented 1 year ago

The current logic was refined during initial tests and first few months of operation, you can check github commits. If lagging peers banned normal peers because they are too far ahead, then lagging peers would mine on top of old p2pool blocks -> chain split, and all their mined blocks would be orphaned. When normal peers ban lagging peers it doesn't result in chain split because lagging peers would still receive latest blocks from other random peers which didn't ban them yet.

bladedoyle commented 1 year ago

I am also having issues with monerod falling out of sync while running p2pool-mini, though I dont know that the issue is related to p2pool. Restarting monerod temporarily gets things working again.

2022-10-20 15:19:40.3290 SideChain add_external_block: block is built on top of an unknown mainchain block 591fa0ee5e4a2f92d8ccdb3cf821fb78abd451cd7edc70fdb091b6ffa42ec265, mainchain reorg might've happened
2022-10-20 15:19:40.3402 SideChain add_external_block: couldn't get mainchain difficulty for height = 2737692
2022-10-20 15:19:40.4587 P2PServer Trying to broadcast a block b2cc3838395d3622583c6e4e7636138022cab7f03dfdf228cf2b8d72271cb841 ahead on mainchain (mainchain height 2737692, current height is 2737514)
2022-10-20 15:19:40.4592 P2PServer peer 103.16.181.169:37888 is ahead on mainchain (height 2737692, your height 2737514). Is your monerod stuck or lagging?

I'll look at monerod to see if I can find if I have any issues there.

Edit: This is the only obvious error I see related to monerod (still looking though):

[33869.729214] monerod[2091]: segfault at 8b ip 00007f3c4c778fc4 sp 00007f2d1d3f7a48 error 4 in libpthread-2.31.so[7f3c4c774000+11000]
[33869.729247] Code: 7e 8f 45 31 d2 ba 01 00 00 00 be 01 00 00 00 48 89 ef b8 ca 00 00 00 0f 05 e9 73 ff ff ff e8 13 b7 ff ff 0f 1f 00 f3 0f 1e fa <8b> 47 10 89 c2 81 e2 7f 01 00 00 90 83 e0 7c 75 7b 53 48 83 ec 10

Edit2: It seems likely that monerod was falling out of sync due to a router/networking issue.

SChernykh commented 1 year ago

@snex Do you have any updates?

snex commented 1 year ago

I have not seen the issue since I first posted, and there have been a few chances for it to happen as well since my internet went down and I was out of sync for a few hours, but everything came back on its own.

snex commented 1 year ago

So, odd thing. Last night internet went out for a bit, and when I woke up p2pool was insisting that monerod was out of sync despite it having resynced just fine after the internet came back. Restarting p2pool brought it back online immediately.

I also haven't nabbed a share in about a week, despite my average share time being 1 every 1.5 days. Not sure if I'm just having extreme bad luck, there is more competition and the calculator hasn't taken that into account, or if there's an actual problem happening.

snex commented 1 year ago

Update on this - I have offloaded my monero node onto a different machine and now even if things get out of sync, they re-sync on their own without intervention.