dirkmc commented 1 year ago

To scale boost we need to scale both storage and retrieval. The retrieval scaling work is covered in the Piece Directory milestone. This issue is to discuss scaling storage.

Architectural Changes

Configuration

Some configuration needs to be shared between boost nodes (eg the wallet used for publishing a deal). Ideally lotus would use the same config sharing mechanism, and probably booster-bitswap, booster-http etc

UI

The UI needs to be updated to show aggregated information across all boost instances:

Storage Deals page should show deals made with all boost instances and indicate which miner the deal was made with
Deal Proposals page should show deals across all boost instances
Sealing Pipeline should show information from all miners
Deal Transfers page should show transfers across all boost instances

All of these pages should allow breaking out the information by boost instance and miner (where applicable).

Boost process management

It should be possible to add links between groups of Boost nodes and groups of miners. Boost nodes would make deals with those miners in round-robin fashion
When a boost node goes offline, other nodes should be able to take over its deals.

Storage Provider execution

The Storage Provider is composed of several subsystems that need changes for scalability.

1. Add miner ID parameter to APIs used by boost

SectorsStatus(ctx context.Context, sid abi.SectorNumber, showOnChainInfo bool) (api.SectorInfo, error)
AddPiece(ctx context.Context, size abi.UnpaddedPieceSize, r io.Reader, d api.PieceDealInfo) (abi.SectorNumber, abi.PaddedPieceSize, error)
ComputeDataCid(ctx context.Context, pieceSize abi.UnpaddedPieceSize, pieceData storage.Data) (abi.PieceInfo, error)

2. Fund Manager

The Fund Manager

checks that the SP has enough funds to accept a deal
"tags" funds for a deal when the deal is accepted
releases the funds when the deal is published

We should move Fund Manager state and config into shared state between boost instances. Note that the wallets themselves are on chain so they are already in shared state.

3. Storage Manager:

The Storage Manager

checks that the SP has enough available storage space to download a deal
"tags" space for a deal when the deal is accepted
releases the space when the deal is handed off to the sealing subsystem

The Storage Manager is specific to a boost node, so we may not need to change anything here.

4. Deal Publisher

The Deal Publisher keeps track of deals that are queued for publish, and publishes them in a batch after the wait period expires (default 1 hour) or once the maximum number of deals per batch is reached (default 8).

To scale the Deal Publisher:

Publish queue should be in shared state between boost nodes
When timer expires:
- One boost node should pick up queued items
- Should mark items as processing
- Should write results to queue on complete
- There should be a fail-over mechanism if the node goes down

5. Storage Ask

The storage ask (pricing) information should be moved to shared state.

Open Questions

Should each boost node have its own libp2p address? Or should we use a load balancer?
How should boost processes be managed?
What configuration mechanisms do other filecoin implementations use?

Related Issues

https://github.com/filecoin-project/boost/issues/464

LexLuthr commented 1 year ago

As deal publisher is ephemeral in nature, we have to consider the case of split-brain in terms of decision making algorithm when choosing who will publish the deal.

There are clear advantages of each Boost using unique libp2p address. But, the miner address lives on chain so this change might not be straightforward. Moreover, we will need to consider the impact on storage-deals over graphsync as well. It might also require considerable time and effort.

LaurenSpiegel commented 1 year ago

Questions --

Configuration -- does Venus use the same configs as lotus? We should keep Venus in mind when designing.
UI - is each boost instance maintaining state of its deals? shouldn't each instance be ephemeral?
Process management -- what are we going to use for this? how will SP's monitor and maintain the instances?
What size are we trying to achieve? From 1 to x? boost nodes

Once fleshed out a bit more we should have a few larger and smaller SP's weigh in.

dirkmc commented 1 year ago

we will need to consider the impact on storage-deals over graphsync as well

By the time this work is complete storage deal protocol v1.1 will probably be deprecated so we may not need to worry too much about graphsync for storage. If not we will need to think about solutions for graphsync 👍

is each boost instance maintaining state of its deals

Currently the deal state is kept in a sqlite database that can only be accessed from the same machine. The intention is to move deal state to a place that can be shared between instances (eg couchbase / mysql etc).

Process management

Management and maintenance of the instances would be through the same web UI. Management of the processes themselves we should think about 👍

What size are we trying to achieve

Ideally it should scale to as many boost nodes as SPs want to add. With remote commp the boost node doesn't use a lot of resources so I would imagine a few dozen is probably as many as an SP would need.

I added a couple of the questions from your comments to the Open Questions section in the description.

willscott commented 1 year ago

Should each boost node have its own libp2p address? Or should we use a load balancer?

I would imagine the SP would prefer a load balancer so that it has control on routing inbound deals to available nodes / preventing accidental DoS of individual boost nodes.

How should boost processes be managed?

I would probably lean towards an un-opinionated golang binary with config file and a web port for communication, so that different SPs can deploy it using whatever management setup they're using - whether it's containers or ansible or other. This is a pretty weak opinion though - i don't feel like i have a great view into standardization of operator environments

What configuration mechanisms do other filecoin implementations use?

Venus
- has a docker compose template
- venus-cluster is their scalable sealer. it uses a set of binaries that are manually copied to worker nodes
Forest
- offers a docker image as primary for operating
- does not have a multi-node mining / sealing story that I can see

does Venus use the same configs as lotus?

no

brendalee commented 1 year ago

Piknik flagged that with ongoing scaling efforts in Lotus as well, it would be good for the two teams to coordinate (will chat with @jennijuju on this for best ways to do this). With scaling in both Lotus and Boost, there's more operational overhead and increased complexity which they'll need to consider.

filecoin-project / boost

Boost Storage Scaling #925