Open bji opened 7 months ago
I see now where the re-forwarding of packets is prevented. In LatestUnprocessedVotes::get_and_insert_forwardable_packets, only votes for which is_forwarded() returns false are included.
However the flow of the loop is definitely wasteful; it does an awful lot of work re-evaluating already forwarded packets in a tight loop. I do not know the degree to which this contributes to the high CPU usage.
Yeah looking at the vote threads, they do seem wasteful in their usage. I think we could probably make a few simple changes here to reduce CPU usage in short-term:
PacketReceiver::get_receive_timeout
. This should be adjusted so the gossip thread will sleep for a much longer period.Long-term I think we may be able to consolidate the 2 threads into single thread, also remove the holding period before gossip votes are released to the thread.
FWIW - I think this issue is tied to the old scheduler (default in v1.17 and v1.18); the new scheduler (default in master) looks to have much lower usage when the leader is not near its' leader slots
Above statement is incorrect - see https://github.com/anza-xyz/agave/issues/636#issuecomment-2048195818
New scheduler doesn't change the vote threads though, so I'd not expect new scheduler to have any effect here.
New scheduler doesn't change the vote threads though, so I'd not expect new scheduler to have any effect here.
I went back and revisited my data that made me make this claim, and it looks like the time window that I sampled was "abnormal" in that it had lower utilization on these threads. The output below show thread name (comm
) and cumulative scheduled time divided by cumulative process time (pcpu
).
Tip-of-master with default (new scheduler)
$ ps -p 3753527 -T -o comm,pcpu | grep BanknStg
solBanknStgTx00 99.6
solBanknStgTx01 99.3
Tip-of-master with old scheduler (--block-production-method thread-local-multi-iterator
):
$ ps -p 953856 -T -o comm,pcpu | grep Bankn
solBanknStgTx00 99.6
solBanknStgTx01 99.4
solBanknStgTx02 0.4
solBanknStgTx03 0.4
solBanknStgTx04 0.4
solBanknStgTx05 0.5
So yeah, my mistake and agreed with you there
On investigation of issues around performance during high packet ingestion rates, I noticed that on all validators I have access to (including two on mainnet and one on testnet; versions 1.17.28, 1.17.28, and 1.18.9 respectively), the threads solBanknStgTx00 and solBanknStgTx01 appear to use very nearly 100% CPU each continuously. Here's a simple process listing:
By my calculation, the other banking threads use approximately 2.5% the CPU of those first two threads.
solBanknStgTx00 is the gossip vote ingestion thread; and solBanknStgTx01 is the TPU vote ingestion thread. It is not entirely clear what causes this high CPU usage although code analysis would seem to indicate that there is nothing preventing these threads from looping continuously as long as there are unprocessed packets in their packet storage queues.
In addition, on reading the code, it looks to me like any packet which arrives in the 20 slots leading up to the validator's leader slots will invoke the ForwardAndHold logic, and at that point I cannot see any way that the packet would leave the queue until the Consume phase is reached 20 slots later. In the meantime it looks to me like the vote TPU thread will continuously re-evaluate the packets it has Forwarded and Held, repeating the forwarding process over and over again. I cannot see how this doesn't lead to a spew of duplicate forwards out of the leader for all packets received in the 20 slots leading up to its leader slots.
I do not fully understand the code and may be wrong; but the high CPU usage of these threads combined with my evaluations might necessitate some deeper analysis of the code involved.