filecoin-project / lassie

A minimal universal retrieval client library for IPFS and Filecoin
Other
109 stars 17 forks source link

Support IPIP-412 `dups=y` mode -- i.e. send duplicates in our responses #239

Closed hannahhoward closed 1 year ago

hannahhoward commented 1 year ago

Goals

After speaking with @aschmahmann, it sounds like there are a lot of good reasons to actually include dups in streaming CARs. Specifically, a light client can skip holding the old blocks if you guarantee dups are sent again.

Still confirming if we will need this, but feels compelling

Implementation

Sadly, this is not what we designed around.

However, I think running the car verifier with a couple changes, in a separate thread, we can "splice in" the dups where needed into the final response.

Ultimately, I'm starting to think this makes more sense in general -- rather than a "streaming blockstore" we should be headed towards a streaming traversal. This will probably add a bit of complexity, but I think there's a future PR where we can clean up based on this.

rvagg commented 1 year ago

I agree with the premise here and would actually use it for | operations, particularly if I get to be able to do something like car extract --expect-duplicates to avoid buffering on stdin reads.

Probably not terribly hard to achieve in our code, in fact, in looking this up I've noticed that I accidentally dropped off the car.AllowDuplicatePuts(false) from the output writer (I'm sure it was originally on there, it's there in 4 places for all our testing), so we may already be doing this in some retrievals anyway! https://github.com/filecoin-project/lassie/blob/23df0260bdc90b26bd26c8a396d67f468c5ae217/pkg/storage/deferredcarwriter.go#L122

The trickiest bit will be dealing with graphsync retrievals where we may have to run a second traversal ourselves as the data comes in to fill in blanks. Or change graphsync to be wasteful?

rvagg commented 1 year ago

Actually it may not even be a problem with graphsync since we control the loader it uses and I don't think it does any SkipMe funkyness with duplicates itself. So maybe this wouldn't be hard at all. Should probably construct some tests to exercise this either way.

willscott commented 1 year ago

We should have a group conversation if we need this as part of the critical path to shipping Rhea - I'd really like to limit scope and while we should absolutely support this, I'm worried this broader set of "support multiple client modes" spec finalization is broader than the core of what we need for Rhea

hannahhoward commented 1 year ago

@willscott I'm not sure whether the gateway folks are able to verify without duplicates based on the current design. Or @aschmahmann will need to fill in.

willscott commented 1 year ago
rvagg commented 1 year ago

I think this is done, anything left on this @hannahhoward?