Closed snex closed 1 year ago
v0.18.1.2-release, Linux, p2pool self compiled.
monerod command: monerod --config-file [path]/monerod.conf --non-interactive p2pool command: p2pool --host 127.0.0.1 --wallet [wallet] --start-mining 3 --mini
monerod.conf: out-peers=50 limit-rate-down=100000 zmq-pub=tcp://127.0.0.1:18083 disable-dns-checkpoints=true enable-dns-blocklist=true
I am using the internal miner because I have tested xmrig and get the same exact hashrate. I also checked through my logs and found that the problem began at exactly 2022-10-10 02:53:28.362 this morning. monerod just suddenly fell behind and kept losing time until I woke up today and turned p2pool off to enable it to catch back up.
I see two possibilities:
It can be miner taking too much CPU so monerod can't keep up, or p2pool + monerod didn't fit in RAM so monerod slowed down because of disk swapping. And if it's not on SSD, it would make this much worse.
Or it can be too many incoming connections (more than 1024) piling up over time, so monerod ran out of open files limit and all sorts of strange things started happening. You don't limit in-peers in monerod.conf
My resources seem fine so it might be the 2nd one - I will limit in-peers and see if it happens again. Thanks.
Ok so turns out I am actually behind a firewall and therefore my in-peers is always at 0. Might it be worthwhile to have a command line option that tells p2pool to skip the call to handle_incoming_block_async in p2p_server.cpp when the peer_height doesn't match our_height? I could probably write the code for that but I'm not sure how I would test it.
This logic already exists: https://github.com/SChernykh/p2pool/blob/master/src/p2p_server.cpp#L1892 Changing it to completely discard everything that doesn't match current height will quickly lead to p2pool chain split, so it's a no go. It's better to fix your monerod issues.
Is there a valid reason to continue processing when the peer_height is greater than our_height by some significant amount? The function will early return false when we are unreasonably stale (our_height is more than 5 ahead of peer_height), but it never does an early return no matter how far behind we are.
The current logic was refined during initial tests and first few months of operation, you can check github commits. If lagging peers banned normal peers because they are too far ahead, then lagging peers would mine on top of old p2pool blocks -> chain split, and all their mined blocks would be orphaned. When normal peers ban lagging peers it doesn't result in chain split because lagging peers would still receive latest blocks from other random peers which didn't ban them yet.
I am also having issues with monerod falling out of sync while running p2pool-mini, though I dont know that the issue is related to p2pool. Restarting monerod temporarily gets things working again.
2022-10-20 15:19:40.3290 SideChain add_external_block: block is built on top of an unknown mainchain block 591fa0ee5e4a2f92d8ccdb3cf821fb78abd451cd7edc70fdb091b6ffa42ec265, mainchain reorg might've happened
2022-10-20 15:19:40.3402 SideChain add_external_block: couldn't get mainchain difficulty for height = 2737692
2022-10-20 15:19:40.4587 P2PServer Trying to broadcast a block b2cc3838395d3622583c6e4e7636138022cab7f03dfdf228cf2b8d72271cb841 ahead on mainchain (mainchain height 2737692, current height is 2737514)
2022-10-20 15:19:40.4592 P2PServer peer 103.16.181.169:37888 is ahead on mainchain (height 2737692, your height 2737514). Is your monerod stuck or lagging?
I'll look at monerod to see if I can find if I have any issues there.
Edit: This is the only obvious error I see related to monerod (still looking though):
[33869.729214] monerod[2091]: segfault at 8b ip 00007f3c4c778fc4 sp 00007f2d1d3f7a48 error 4 in libpthread-2.31.so[7f3c4c774000+11000]
[33869.729247] Code: 7e 8f 45 31 d2 ba 01 00 00 00 be 01 00 00 00 48 89 ef b8 ca 00 00 00 0f 05 e9 73 ff ff ff e8 13 b7 ff ff 0f 1f 00 f3 0f 1e fa <8b> 47 10 89 c2 81 e2 7f 01 00 00 90 83 e0 7c 75 7b 53 48 83 ec 10
Edit2: It seems likely that monerod was falling out of sync due to a router/networking issue.
@snex Do you have any updates?
I have not seen the issue since I first posted, and there have been a few chances for it to happen as well since my internet went down and I was out of sync for a few hours, but everything came back on its own.
So, odd thing. Last night internet went out for a bit, and when I woke up p2pool was insisting that monerod was out of sync despite it having resynced just fine after the internet came back. Restarting p2pool brought it back online immediately.
I also haven't nabbed a share in about a week, despite my average share time being 1 every 1.5 days. Not sure if I'm just having extreme bad luck, there is more competition and the calculator hasn't taken that into account, or if there's an actual problem happening.
Update on this - I have offloaded my monero node onto a different machine and now even if things get out of sync, they re-sync on their own without intervention.
Not really sure what's causing this, but it happens occasionally after the 0.18 update. I will wake up and p2pool is complaining about monerod being out of sync. I shut down p2pool and then monerod will catch up no problem.