filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.82k stars 1.25k forks source link

Storage / Retrieval Deals With Partial Content #7227

Open hannahhoward opened 3 years ago

hannahhoward commented 3 years ago

Checklist

Lotus component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Let's say I want to store a large existing IPLD dataset larger than a sector on Filecoin. Currently, we face several obstacles:

  1. Right now, from a storage standpoint, the only way to store anything but a whole DAG is an offline deal
  2. From a retrieval standpoint, we can retrieve a partial DAG via expressing a selector other than "give me the entire DAG". But there are various problems here for our large dataset:
    1. We can't do this at the CLI level currently cause we lack a command line syntax for selectors.
    2. Even if we could, the syntax for selectors is limited ATM -- we lack a "give me the whole DAG except the part below this CID cause I know it's in another piece" selector
    3. Even if we had more powerful selectors, selectors require the retrieval client to know a-priori what the right selector is to get the part of the DAG contained in a single sector.

Let's consider what we'd like to be possible:

  1. The person storing should be able to break up their very large DAG in arbitrary ways into a set of partial DAGs
  2. The person retrieving should be able to just start at the root, make a retrieval, see what they get back, and then plan to make retrievals from there.

We also already have alternate storage clients like Estuary that are failing proposed deals cause they are trying to send partial DAG data to miners.

Describe the solution you'd like

Fortunately, our underlying transport protocol for data transfer, Graphsync, can serve requests where the peer sending the data only has part of the DAG expressed by the requested CID+Selector. The Graphsync responder knows how to communicate to request what it served and what it didn't, and the requestor knows how to process this information and still verify the response.

Currently, the go-data-transfer library currently fails all transfers where the entire request root + IPLD selector is not served.

I propose that we allow data transfers to complete successful for a transfer that have only serves a partial response.

My proposed bubbling up to Lotus is as follows:

Describe alternatives you've considered

see above -- while selectors are a path forward potentially they have several limitations and the path to achieving a desirable result through them is long

Additional context

jennijuju commented 3 years ago

Cc @whyrusleeping @dirkmc @aarshkshah1992 @raulk for review

hannahhoward commented 3 years ago

I want to point out why you want this AS WELL AS very good selectors.

We already have a StopAt selector in latest versions of go-ipld-prime: https://github.com/ipld/go-ipld-prime/pull/214

However, particularly in the retrieval case, the client may not know how to assemble this selector ahead of time. If I make a deal for a complex DAG with several missing pieces, for a client to retrieve this with a selector they need to know ahead of time what pieces are missing. This is pretty tricky to communicate -- or it adds overhead to discovery mechanisms.

It seems ideal to still be able to serve a "not quite complete retrieval" as a fallback

jsign commented 3 years ago

We're interested in this feature. It would make packing bigger-than-a-sectors DAGs in sector-sized deals much simpler since we don't have to deal with "complete-subdags" constraints. So, just pack the max amount of blocks possible and let the retriever know that should retrieve X deals to get the complete thing.

If doing partial retrievals makes sense for the client, so then let that be an "application" constraint that should be considered while packing things in deals; but not really mandatory.