filecoin-project / dagstore

a sharded store to hold large IPLD graphs efficiently, packaged as location-transparent attachable CAR files, with mechanical sympathy
Other
42 stars 24 forks source link

Add support for deals using a data segment index #154

Open willscott opened 1 year ago

willscott commented 1 year ago

A deal may, rather than a single car, be a series of concatinated car files with a segment index at the end containing an inclusion proof. When such an index is found, we will allow reading multiple cars and taking valid cids found in all of them as the contents of the overall deal.

ref: https://github.com/filecoin-project/boost/issues/1258

willscott commented 1 year ago

filing https://github.com/filecoin-project/go-data-segment/issues/9 to see if we can get a fixture to use for validation

Wondertan commented 1 year ago

This PR brings Filecoin implementation detail to Dagstore and multiple new dependencies. Is there a way to implement this using Dagstore's API on the layer above, rather then polluting its internals?

willscott commented 1 year ago

That's a good point, though carv2 indexes are in a way an equally opinionated existing equivalent. I'll see what it would look like for the 'get an index for this shard' to be fully handled by the caller

raulk commented 1 year ago

I think there's an implicit decision being taken here, which is to model segments as indexed entries over a single shard (= storage deal in the Filecoin context), instead of as dedicated shards. Establishing a shard for every segment was not possible in the past since there was no standard way of delineating logical units within a Filecoin deal. With PoDSI, this is now possible. Making shard == segment could let us keep the dagstore Filecoin-agnostic, but I suspect is not viable without additional complexity. @willscott could you walk us through the tradeoffs here?

willscott commented 1 year ago

Making shard == segment could let us keep the dagstore Filecoin-agnostic, but I suspect is not viable without additional complexity. @willscott could you walk us through the tradeoffs here?

shard == segment

shard == deal

raulk commented 1 year ago

Thanks! I think the case for shard == segment isn't strong today, but it may strengthen over time as selective unsealing becomes available, PoDSI gains traction, and segment-based access patterns become a bottleneck and the next frontier for optimization. Agree with the implicit decision made here, but +1 to @Wondertan's request to invert the control of the index population, so that the API caller can feed the segments.

willscott commented 1 year ago

cc @Kubuxu