AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 68 forks source link

Add documentation on how to run a resilient BP architecture #90

Open matthewdarwin opened 2 years ago

matthewdarwin commented 2 years ago

From Anders:

image\

I've been breaking down how our peering setup is done... I tried to make it as simple and easy to understand as I could.. but it was not a super easy task.

It's also added to this article that I wrote to try to help folks grasp a bit on what is happening.

https://waxsweden.org/why-did-some-of-your-transactions-vanish/

UrsaPolarisRecords commented 2 years ago

Here's an updated draft for Nation's node architecture article. It may be beneficial to combine information from both articles, as the Sw/eden article is much more human-readable, while the Nation article outlines the specific recommendations and config options.

Antelope Node Architecture: Improving Reliability

September 1, 2022 by EOS Nation

Antelope blockchains share a powerful codebase that allows for high levels of throughput. However, large transaction loads can overwhelm nodes if the node architecture is improperly configured.

Our recommendations for this architecture coalesce around the idea of isolating public-facing nodes from peer nodes, and optimizing configuration parameters for each node's use case.

in short:

We have explored this topic before in a blog post, and since this post, key block producers on Antelope chains have battle-tested and refined the outlined architecture. Today we want to update this information, with a technical look at the details of the architecture.

Troubleshooting

In late August 2022, the WAX blockchain began experiencing larger than normal transaction queues. Upon further inspection, it appeared that a large number of failed transactions would overwhelm nodes, causing nodes to generate blocks with very few or no transactions. This caused many transactions to expire and led to microforks. The block producers took many mitigation efforts to help diagnose the issue, experimenting with deferred transactions, subjective billing, bypassing sentry nodes, and various configuration options.

One effective step these block producers found related to the architecture of a block producer and the nodes connected to it.

With a large number of connections to the block producer node, the myriad of transactions hitting the signing node from multiple connections prevent the node from working efficiently. The solution requires a reduction in the number of connections to the signing node. Ultimately, that means having only 2 connections.

Many block producers have already adopted this node architecture. However, others might still have multiple connections to the public P2P cluster, multiple connections to API nodes, and other various connections. All of these connections need to be replaced by only 2 connections.

A few more adjustments can reduce the remaining dropped blocks further. To add context, the Sw/eden Block Producer published an article outlining the issues on WAX and how to configure settings to assure blocks and transactions arrive on time.

Block Production node:

The block production node, or block signing node, should be protected from public API and P2P connections, and should only have two node types connecting to the rest of the chain: the Sentry nodes (for transactions + blocks) and the Blocks Peer nodes (for blocks only).

The block producer node should have the following architecture:

chain-state-db-size-mb = 32768

reversible-blocks-db-size-mb = 2048

disable_subjective_api_billing = false

disable_subjective_p2p_billing = false

database-map-mode = heap

http-max-response-time-ms = 300

http-validate-host = false

p2p-max-nodes-per-host = 2

agent-name = INSERT NAME OF BP HERE

max-clients = 0

net-threads = 2

verbose-http-errors = true

abi-serializer-max-time-ms = 2000

cpu-effort-percent = 40

last-block-cpu-effort-percent = 20

max-transaction-time = 35

subjective-cpu-leeway-us = 36000

producer-name = INSERT ACCOUNT OF PRODUCER

signature-provider = INSERT PRODUCER KEY

actor-blacklist = INSERT MULTIPLE LINES HERE

plugin = eosio::http_plugin

plugin = eosio::chain_api_plugin

plugin = eosio::net_api_plugin

plugin = eosio::producer_api_plugin

plugin = eosio::db_size_api_plugin

plugin = eosio::producer_plugin

![NodeArchitecture1](https://user-images.githubusercontent.com/36178664/188028565-f614631b-fe6f-4993-b49e-dd26adc43fad.png)

**First Connection to the BP node: Blocks Only**

The 'blocks only' P2P network quickly delivers blocks between block producers. Allowing block producers to pass blocks directly between each other while ignoring transactions means a reduced processing load – and when blocks arrive on time, microforks don't happen.

So a newly produced block has to go from producer 1 -\> block peer node 1 -\> block peer node 2 -\> producer 2.

The block peer node should have the following architecture:

- Have 2 block peer nodes for redundancy.
- Connect block peer nodes to other block producers or other trusted entities.
- Additionally, connect block peer nodes to BP nodes.
- A block peer node should implement the following configuration options:

wasm-runtime = eos-vm-jit

eos-vm-oc-enable = true

chain-state-db-size-mb = 32768

reversible-blocks-db-size-mb = 2048

http-max-response-time-ms = 300

read-mode = head

database-map-mode = heap

p2p-accept-transactions = false

http-validate-host = false

p2p-max-nodes-per-host = 2

agent-name = INSERT NAME OF BP HERE

max-clients = 0

net-threads = 5

verbose-http-errors = true

abi-serializer-max-time-ms = 2000

plugin = eosio::http_plugin

plugin = eosio::chain_api_plugin

plugin = eosio::net_api_plugin

plugin = eosio::producer_api_plugin

plugin = eosio::db_size_api_plugin

**Second Connection to the BP node: Transactions only**

This new node architecture includes a transactions network. Note: the transactions network also includes blocks.

All incoming transactions, no matter where they are coming from, must consolidate into a single "transaction peer" node that connects to the signing node. This transaction peer node must handle as many transactions as possible, but also must not overwhelm the signing node with too many transactions so as to allow the signing node to continue producing blocks efficiently.

The transaction peer node should have the following architecture:

- Have 2 block peer nodes for redundancy.
- Connect transaction peer nodes to other block producers or other trusted entities.
- Additionally, connect transaction peer nodes to BP nodes.
- A transaction peer node should implement the following configuration options:

wasm-runtime = eos-vm-jit

chain-state-db-size-mb = 32768

reversible-blocks-db-size-mb = 2048

http-max-response-time-ms = 300

disable_subjective_api_billing = false

disable_subjective_p2p_billing = false

read-mode = head

database-map-mode = heap

http-validate-host = false

p2p-max-nodes-per-host = 2

agent-name = INSERT NAME OF BP HERE

max-clients = 0

net-threads = 5

verbose-http-errors = true

abi-serializer-max-time-ms = 2000

plugin = eosio::http_plugin

plugin = eosio::chain_api_plugin

plugin = eosio::net_api_plugin

plugin = eosio::producer_api_plugin

plugin = eosio::db_size_api_plugin


**API Nodes:**

API nodes serve responses when given a request.

You will notice that optimized compiling (OC) is not enabled, and p2p-accept-transactions = false. This avoids processing lots of transactions and the RPC requests for transactions are billed at the same value that will be used on the BP node (assuming hardware CPU is the same). enable-account-queries = true can be set to enable lookup of account information. This is important on public API nodes.

As a reminder, the Producer API and Chain API must not be exposed to public. Use a reverse proxy to expose the /v1/chain/... APIs, but keep the others private.

The API node should have the following architecture:

- Connect the API node to both the "block relay" and "transaction sentry" nodes.
- An API node should implement the following configuration options:

wasm-runtime = eos-vm-jit

chain-state-db-size-mb = 32768

reversible-blocks-db-size-mb = 2048

http-max-response-time-ms = 300

read-mode = head

database-map-mode = heap

p2p-accept-transactions = false

disable-api-persisted-trx = true

http-validate-host = false

p2p-max-nodes-per-host = 2

agent-name = INSERT NAME OF BP HERE

max-clients = 0

net-threads = 5

http-threads = 8

verbose-http-errors = true

abi-serializer-max-time-ms = 2000

http-server-address = 0.0.0.0:8888

enable-account-queries = true

plugin = eosio::http_plugin

plugin = eosio::chain_api_plugin

plugin = eosio::net_api_plugin

plugin = eosio::producer_api_plugin

plugin = eosio::db_size_api_plugin


**Conclusion**

Figuring out the problem and coming up with a solution was a collaborative effort between the Top 21 block producers. This took a tremendous amount of work and coordination on the part of many teams located all around the world. It's normal that some teams were more involved than others, but all the involved teams contributed what they could.

As a final note, it's important to note that not all block producers have the exact same configuration. Node topology, CPU speed, transaction load and EOSIO versions vary, as different block producers and node operators have different needs and requirements.

This summary should be helpful for other block producers on Wax chains, as well as other Antelope network block producers who might want to come up with similar designs to prevent overloading.