anza-xyz / agave

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://www.anza.xyz/
Apache License 2.0
348 stars 153 forks source link

SWQoS: Enhance the resiliency of peering between the peered RPC and the validator during validator node failover #1808

Open JulI0-concerto opened 3 months ago

JulI0-concerto commented 3 months ago

Problem

The Solana-validator argument --rpc-send-transaction-tpu-peer is not ideal for validators utilizing node failover as described in the Solana documentation: Validator Failover.

During a failover to a secondary node, the peered RPC might not reliably send transactions because the validator's IP will be considered as an unstaked node.

Proposed Solution Rather than using , it would be more effective to rely on the validator's identity. The IP and port can then be dynamically determined from gossip.

bji commented 3 months ago

Probably best to interpret the argument parameter as HOST:PORT if it has a colon, and an identity if it does not. This is backwards compatible with the existing argument structure so won't affect anyone who uses HOST:PORT and would also allow them to continue doing so if they want to.

JulI0-concerto commented 22 hours ago

@bji I'll be closing this PR. After reviewing the code again and conducting some tests, I've found that the --rpc-send-transaction-tpu-peer option can be used multiple times. This means you can specify two nodes, and SWQoS peering will function correctly with both.