filecoin-project / boost

Boost is a tool for Filecoin storage providers to manage data storage and retrievals on Filecoin.
Other
110 stars 67 forks source link

Boost Storage Scaling #925

Open dirkmc opened 1 year ago

dirkmc commented 1 year ago

To scale boost we need to scale both storage and retrieval. The retrieval scaling work is covered in the Piece Directory milestone. This issue is to discuss scaling storage.

Architectural Changes

Configuration

Some configuration needs to be shared between boost nodes (eg the wallet used for publishing a deal). Ideally lotus would use the same config sharing mechanism, and probably booster-bitswap, booster-http etc

UI

The UI needs to be updated to show aggregated information across all boost instances:

All of these pages should allow breaking out the information by boost instance and miner (where applicable).

Boost process management

Storage Provider execution

The Storage Provider is composed of several subsystems that need changes for scalability.

1. Add miner ID parameter to APIs used by boost

2. Fund Manager

The Fund Manager

We should move Fund Manager state and config into shared state between boost instances. Note that the wallets themselves are on chain so they are already in shared state.

3. Storage Manager:

The Storage Manager

The Storage Manager is specific to a boost node, so we may not need to change anything here.

4. Deal Publisher

The Deal Publisher keeps track of deals that are queued for publish, and publishes them in a batch after the wait period expires (default 1 hour) or once the maximum number of deals per batch is reached (default 8).

To scale the Deal Publisher:

5. Storage Ask

The storage ask (pricing) information should be moved to shared state.

Open Questions

  1. Should each boost node have its own libp2p address? Or should we use a load balancer?
  2. How should boost processes be managed?
  3. What configuration mechanisms do other filecoin implementations use?

Related Issues

LexLuthr commented 1 year ago
  1. As deal publisher is ephemeral in nature, we have to consider the case of split-brain in terms of decision making algorithm when choosing who will publish the deal.

There are clear advantages of each Boost using unique libp2p address. But, the miner address lives on chain so this change might not be straightforward. Moreover, we will need to consider the impact on storage-deals over graphsync as well. It might also require considerable time and effort.

LaurenSpiegel commented 1 year ago

Questions --

  1. Configuration -- does Venus use the same configs as lotus? We should keep Venus in mind when designing.
  2. UI - is each boost instance maintaining state of its deals? shouldn't each instance be ephemeral?
  3. Process management -- what are we going to use for this? how will SP's monitor and maintain the instances?
  4. What size are we trying to achieve? From 1 to x? boost nodes

Once fleshed out a bit more we should have a few larger and smaller SP's weigh in.

dirkmc commented 1 year ago

we will need to consider the impact on storage-deals over graphsync as well

By the time this work is complete storage deal protocol v1.1 will probably be deprecated so we may not need to worry too much about graphsync for storage. If not we will need to think about solutions for graphsync 👍

is each boost instance maintaining state of its deals

Currently the deal state is kept in a sqlite database that can only be accessed from the same machine. The intention is to move deal state to a place that can be shared between instances (eg couchbase / mysql etc).

Process management

Management and maintenance of the instances would be through the same web UI. Management of the processes themselves we should think about 👍

What size are we trying to achieve

Ideally it should scale to as many boost nodes as SPs want to add. With remote commp the boost node doesn't use a lot of resources so I would imagine a few dozen is probably as many as an SP would need.

I added a couple of the questions from your comments to the Open Questions section in the description.

willscott commented 1 year ago

Should each boost node have its own libp2p address? Or should we use a load balancer?

I would imagine the SP would prefer a load balancer so that it has control on routing inbound deals to available nodes / preventing accidental DoS of individual boost nodes.

How should boost processes be managed?

I would probably lean towards an un-opinionated golang binary with config file and a web port for communication, so that different SPs can deploy it using whatever management setup they're using - whether it's containers or ansible or other. This is a pretty weak opinion though - i don't feel like i have a great view into standardization of operator environments

What configuration mechanisms do other filecoin implementations use?

does Venus use the same configs as lotus?

no

brendalee commented 1 year ago

Piknik flagged that with ongoing scaling efforts in Lotus as well, it would be good for the two teams to coordinate (will chat with @jennijuju on this for best ways to do this). With scaling in both Lotus and Boost, there's more operational overhead and increased complexity which they'll need to consider.