Closed bhorvitz closed 1 year ago
Confirmed the same exists in 2.0.0b5.dev76 (alpha 4.6) 5.3 second internal delay:
2023-07-12T15:04:00.186 full_node chia.full_node.full_node: INFO ⏲️ Finished signage point 3/64: CC: bba42e749020ad525d999701c8b5eb5ab0ccc97305367d0da5422896457f967c RC: 5b7d27b9a2af00f08ce6b3b04c73a84d0a823bd9e7c723e1f769dfe11847d3c0
2023-07-12T15:04:05.416 farmer farmer_server : DEBUG <- new_signage_point from peer be74a331abf2c88836ea0660f437bff4d4f2236d3039ebd33262e03881b7c94e 127.0.0.1
Per wallentx, adding some context to the environment: This node is a single Xeon 6134 with 128G RAM running Cent9. OS and DB are on a RAID 10 of 12G SAS SSD. The tasks this system is performing are running a full node, wallet, and farmer. There is some stuff going on to the side like Prometheus and Philip Norman's chia-monitor for monitoring, syslog, and the usual OS stuff.
All harvesting is done remotely on a single harvester of dual Xeon 8272 with 256G RAM and a 3080Ti also running Cent9 doing GPU harvesting of about 6.5PiB of C5s, however as mentioned, this problem seems to occur well before it reaches out to the harvester.
I was able to determine that this delay was being caused by chia-monitor. Disabling it caused all of the internal node->farmer delay to go away.
I'll leave this open for someone else to decide to close or not. chia-monitor is no longer maintained however it is not doing anything other than making rpc calls, so should that really impact node operations?
hmmm... no idea but maybe someone else will see this and know about chia-monitor
This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.
What happened?
Normally, full_node will finish a SP and pass it to the farmer in a few ms. On occasion, this interval can be many seconds. Passed down the line to a harvester which might take 3-4 seconds to finish a lookup, it can possibly overlap the next SP, causing spikes in lookup times since that harvester will then be trying to process more than one thing at a time. This is particularly problematic with GPU farming.
In the log output below, you can see normal intervals and then the 07:14:05.336 SP takes 3.2s and 07:14:22.934 takes 5.4s.
This is easily reproducible. It happens every few minutes.
Version
1.8.2rc6.dev115
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output