Open matthewdarwin opened 2 years ago
Here's an updated draft for Nation's node architecture article. It may be beneficial to combine information from both articles, as the Sw/eden article is much more human-readable, while the Nation article outlines the specific recommendations and config options.
September 1, 2022 by EOS Nation
Antelope blockchains share a powerful codebase that allows for high levels of throughput. However, large transaction loads can overwhelm nodes if the node architecture is improperly configured.
Our recommendations for this architecture coalesce around the idea of isolating public-facing nodes from peer nodes, and optimizing configuration parameters for each node's use case.
in short:
The sentry node processes transactions to reduce the workload for the block producer node. It also increases reliability and speed.
Block only nodes push blocks as fast as possible through the peering network and don't care about transactions.
Transaction only nodes process transactions and sort them to reduce load on producers, to increase speed and reliability for users
Push API nodes increase speed, capacity and reliability for trx to reach producers afap
caching requests : adds a LOT of speed, reliability and decreases load on nodes, so less can do more, and users get their info faster.
We have explored this topic before in a blog post, and since this post, key block producers on Antelope chains have battle-tested and refined the outlined architecture. Today we want to update this information, with a technical look at the details of the architecture.
Troubleshooting
In late August 2022, the WAX blockchain began experiencing larger than normal transaction queues. Upon further inspection, it appeared that a large number of failed transactions would overwhelm nodes, causing nodes to generate blocks with very few or no transactions. This caused many transactions to expire and led to microforks. The block producers took many mitigation efforts to help diagnose the issue, experimenting with deferred transactions, subjective billing, bypassing sentry nodes, and various configuration options.
One effective step these block producers found related to the architecture of a block producer and the nodes connected to it.
With a large number of connections to the block producer node, the myriad of transactions hitting the signing node from multiple connections prevent the node from working efficiently. The solution requires a reduction in the number of connections to the signing node. Ultimately, that means having only 2 connections.
Many block producers have already adopted this node architecture. However, others might still have multiple connections to the public P2P cluster, multiple connections to API nodes, and other various connections. All of these connections need to be replaced by only 2 connections.
A few more adjustments can reduce the remaining dropped blocks further. To add context, the Sw/eden Block Producer published an article outlining the issues on WAX and how to configure settings to assure blocks and transactions arrive on time.
Block Production node:
The block production node, or block signing node, should be protected from public API and P2P connections, and should only have two node types connecting to the rest of the chain: the Sentry nodes (for transactions + blocks) and the Blocks Peer nodes (for blocks only).
The block producer node should have the following architecture:
wasm-runtime = eos-vm-jit
chain-state-db-size-mb = 32768
reversible-blocks-db-size-mb = 2048
disable_subjective_api_billing = false
disable_subjective_p2p_billing = false
database-map-mode = heap
http-max-response-time-ms = 300
http-validate-host = false
p2p-max-nodes-per-host = 2
agent-name = INSERT NAME OF BP HERE
max-clients = 0
net-threads = 2
verbose-http-errors = true
abi-serializer-max-time-ms = 2000
cpu-effort-percent = 40
last-block-cpu-effort-percent = 20
max-transaction-time = 35
subjective-cpu-leeway-us = 36000
producer-name = INSERT ACCOUNT OF PRODUCER
signature-provider = INSERT PRODUCER KEY
actor-blacklist = INSERT MULTIPLE LINES HERE
plugin = eosio::http_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_api_plugin
plugin = eosio::producer_api_plugin
plugin = eosio::db_size_api_plugin
plugin = eosio::producer_plugin
![NodeArchitecture1](https://user-images.githubusercontent.com/36178664/188028565-f614631b-fe6f-4993-b49e-dd26adc43fad.png)
**First Connection to the BP node: Blocks Only**
The 'blocks only' P2P network quickly delivers blocks between block producers. Allowing block producers to pass blocks directly between each other while ignoring transactions means a reduced processing load – and when blocks arrive on time, microforks don't happen.
So a newly produced block has to go from producer 1 -\> block peer node 1 -\> block peer node 2 -\> producer 2.
The block peer node should have the following architecture:
- Have 2 block peer nodes for redundancy.
- Connect block peer nodes to other block producers or other trusted entities.
- Additionally, connect block peer nodes to BP nodes.
- A block peer node should implement the following configuration options:
wasm-runtime = eos-vm-jit
eos-vm-oc-enable = true
chain-state-db-size-mb = 32768
reversible-blocks-db-size-mb = 2048
http-max-response-time-ms = 300
read-mode = head
database-map-mode = heap
p2p-accept-transactions = false
http-validate-host = false
p2p-max-nodes-per-host = 2
agent-name = INSERT NAME OF BP HERE
max-clients = 0
net-threads = 5
verbose-http-errors = true
abi-serializer-max-time-ms = 2000
plugin = eosio::http_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_api_plugin
plugin = eosio::producer_api_plugin
plugin = eosio::db_size_api_plugin
**Second Connection to the BP node: Transactions only**
This new node architecture includes a transactions network. Note: the transactions network also includes blocks.
All incoming transactions, no matter where they are coming from, must consolidate into a single "transaction peer" node that connects to the signing node. This transaction peer node must handle as many transactions as possible, but also must not overwhelm the signing node with too many transactions so as to allow the signing node to continue producing blocks efficiently.
The transaction peer node should have the following architecture:
- Have 2 block peer nodes for redundancy.
- Connect transaction peer nodes to other block producers or other trusted entities.
- Additionally, connect transaction peer nodes to BP nodes.
- A transaction peer node should implement the following configuration options:
wasm-runtime = eos-vm-jit
chain-state-db-size-mb = 32768
reversible-blocks-db-size-mb = 2048
http-max-response-time-ms = 300
disable_subjective_api_billing = false
disable_subjective_p2p_billing = false
read-mode = head
database-map-mode = heap
http-validate-host = false
p2p-max-nodes-per-host = 2
agent-name = INSERT NAME OF BP HERE
max-clients = 0
net-threads = 5
verbose-http-errors = true
abi-serializer-max-time-ms = 2000
plugin = eosio::http_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_api_plugin
plugin = eosio::producer_api_plugin
plugin = eosio::db_size_api_plugin
**API Nodes:**
API nodes serve responses when given a request.
You will notice that optimized compiling (OC) is not enabled, and p2p-accept-transactions = false. This avoids processing lots of transactions and the RPC requests for transactions are billed at the same value that will be used on the BP node (assuming hardware CPU is the same). enable-account-queries = true can be set to enable lookup of account information. This is important on public API nodes.
As a reminder, the Producer API and Chain API must not be exposed to public. Use a reverse proxy to expose the /v1/chain/... APIs, but keep the others private.
The API node should have the following architecture:
- Connect the API node to both the "block relay" and "transaction sentry" nodes.
- An API node should implement the following configuration options:
wasm-runtime = eos-vm-jit
chain-state-db-size-mb = 32768
reversible-blocks-db-size-mb = 2048
http-max-response-time-ms = 300
read-mode = head
database-map-mode = heap
p2p-accept-transactions = false
disable-api-persisted-trx = true
http-validate-host = false
p2p-max-nodes-per-host = 2
agent-name = INSERT NAME OF BP HERE
max-clients = 0
net-threads = 5
http-threads = 8
verbose-http-errors = true
abi-serializer-max-time-ms = 2000
http-server-address = 0.0.0.0:8888
enable-account-queries = true
plugin = eosio::http_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_api_plugin
plugin = eosio::producer_api_plugin
plugin = eosio::db_size_api_plugin
**Conclusion**
Figuring out the problem and coming up with a solution was a collaborative effort between the Top 21 block producers. This took a tremendous amount of work and coordination on the part of many teams located all around the world. It's normal that some teams were more involved than others, but all the involved teams contributed what they could.
As a final note, it's important to note that not all block producers have the exact same configuration. Node topology, CPU speed, transaction load and EOSIO versions vary, as different block producers and node operators have different needs and requirements.
This summary should be helpful for other block producers on Wax chains, as well as other Antelope network block producers who might want to come up with similar designs to prevent overloading.
From Anders:
\
I've been breaking down how our peering setup is done... I tried to make it as simple and easy to understand as I could.. but it was not a super easy task.
It's also added to this article that I wrote to try to help folks grasp a bit on what is happening.
https://waxsweden.org/why-did-some-of-your-transactions-vanish/