Closed hannahhoward closed 1 year ago
I agree with the premise here and would actually use it for |
operations, particularly if I get to be able to do something like car extract --expect-duplicates
to avoid buffering on stdin reads.
Probably not terribly hard to achieve in our code, in fact, in looking this up I've noticed that I accidentally dropped off the car.AllowDuplicatePuts(false)
from the output writer (I'm sure it was originally on there, it's there in 4 places for all our testing), so we may already be doing this in some retrievals anyway! https://github.com/filecoin-project/lassie/blob/23df0260bdc90b26bd26c8a396d67f468c5ae217/pkg/storage/deferredcarwriter.go#L122
The trickiest bit will be dealing with graphsync retrievals where we may have to run a second traversal ourselves as the data comes in to fill in blanks. Or change graphsync to be wasteful?
Actually it may not even be a problem with graphsync since we control the loader it uses and I don't think it does any SkipMe
funkyness with duplicates itself. So maybe this wouldn't be hard at all. Should probably construct some tests to exercise this either way.
We should have a group conversation if we need this as part of the critical path to shipping Rhea - I'd really like to limit scope and while we should absolutely support this, I'm worried this broader set of "support multiple client modes" spec finalization is broader than the core of what we need for Rhea
@willscott I'm not sure whether the gateway folks are able to verify without duplicates based on the current design. Or @aschmahmann will need to fill in.
bifrost-gateway
is reading the car into a block store, and then confirming a traversal over that blockstore.I think this is done, anything left on this @hannahhoward?
Goals
After speaking with @aschmahmann, it sounds like there are a lot of good reasons to actually include dups in streaming CARs. Specifically, a light client can skip holding the old blocks if you guarantee dups are sent again.
Still confirming if we will need this, but feels compelling
Implementation
Sadly, this is not what we designed around.
However, I think running the car verifier with a couple changes, in a separate thread, we can "splice in" the dups where needed into the final response.
Ultimately, I'm starting to think this makes more sense in general -- rather than a "streaming blockstore" we should be headed towards a streaming traversal. This will probably add a bit of complexity, but I think there's a future PR where we can clean up based on this.