Closed snissn closed 1 year ago
At first glance I'm concerned about performance. IMO it will be important to get really early data on 1) read and write speed 2) small chain sync job to make sure that performance is acceptable. One anecdote behind this is that fil-infra is looking into moving from snapshot over lotus api to snapshot directly to disk for performance reasons. I don't see why good performance isn't possible in principle but there may be a few hidden roadblocks.
One obvious and nice benefit to separation is a removal of compute contention between splitstore or other possible disk management approaches and chain sync, which is currently hypothesized to be a problem. This separation further opens up the solution space of disk management approaches. For example instead of moving GC swapping datastores on the same machine we could stream keys to a new badger server and then swap rpc calls out.
Something you've probably already thought of but worth bringing up -- when dealing with IPLD data all k,v pairs are immutable. This presents at least a theoretical opportunity for parallel writes. Our use of badger is almost entirely IPLD and it could be the case that you separate out only the IPLD data into the remote badger storage (leaving /meta blockstore directly in lotus). Then the main contention problem from multiple writers is bloating of badger, potentially not a problem we even want to solve with well configured splitstore running moving GC, or potentially something you can solve well enough with a layer of smart caching filtering out duplicates above badger itself.
A little more on performance. My worries might be mostly snapshot related which could make sense with your existing estimations since data throughput is probably a lot higher on those jobs. Worth thinking through all of the read/write intensive jobs when thinking through performance. I can think of
The main issue isn't the duplicating data, it's having every node maintain sync while also serving requests. Unfortunately, sharing a common blockstore will likely introduce more bottlenecks (the shared store) than it'll solve.
Instead, we need a leader/follower architecture where multiple lotus nodes can "follow" a leader node. It should be relatively easy to replace the lotus sync service with one that follows another node with ChainNotify
. Additionally, we need to share state/indices:
In terms of safety:
The tricky part will be the "head" tipset. That won't get executed by the "leader" until the next epoch. We have two options:
This is half-way between a "light" node and a "full" node:
this has been deprecated for https://github.com/filecoin-project/lotus/issues/10630
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
An API provider, is using the Filecoin miner Lotus and is looking to scale up their infrastructure. Currently, a single Lotus node manages the Badger datastore and communicates with other Lotus nodes, which can become a bottleneck as the system grows.
Describe the solution you'd like
Proposed Solution
The proposal aims to split up a Lotus node into:
To achieve this, we will fork and modify the go-ds-badger repository and rename it to
go-ds-badger-client
. This will change the datastore to be a TCP client that sends JSON commands to a new resource, thego-ds-badger-server
, which will be on a separate machine and process requests. We anticipate being able to handle the system requirements, as a Lotus node is responsible for approximately 5 MB/s of disk writes.The primary goal of this plan is to have many RPC nodes that share a data resource to scale up various types of RPC requests.
Uncertainties
There are some uncertainties about how writes will work in this new configuration:
Chain syncing
RPC Requests
In the context of Lotus nodes, some RPC API requests are read-only, which means they do not require modifying the underlying datastore. Examples of such read-only requests include querying the blockchain state, retrieving transaction information, or checking account balances. Scaling up read-only requests is relatively straightforward, as the datastore remains unchanged, allowing for the deployment of multiple Lotus API nodes to handle increased read request loads.
Despite these uncertainties, we remain optimistic that our proposed solution will effectively address the scalability challenges and pave the way for a more robust infrastructure for RPC hosting providers
ASCII Diagram Before:
After:
Conclusion
In conclusion, by implementing the proposed solution to separate the Badger datastore from the Lotus nodes and introducing multiple Lotus API nodes, we expect to significantly improve the scalability of our Lotus node infrastructure. This approach will allow us to efficiently scale up read-only RPC requests while maintaining performance and consistency. However, we acknowledge that addressing write operations, such as sending transactions and chain syncing, will require a more nuanced solution to ensure datastore synchronization and maintain system integrity. As we move forward, we remain optimistic about finding a suitable approach for handling write operations and appreciate any feedback or suggestions that can help address these challenges.