RangerMauve / js-ipfs-unixfs-offset-chunker

A custom chunker for UnixFS which uses offsets to split a stream of data into chunks. Useful for fine grained deduplication.
MIT License
1 stars 1 forks source link

Custom dag building in addition to chunking? #2

Open ikreymer opened 2 years ago

ikreymer commented 2 years ago

I wonder if also having a custom dag along with custom offset-based chunking also makes sense? Ex, given the offsets [4M, 7M] and a 9M file, the dag would consists of 3 intermediate roots grouping chunks 0-4M, 4M-7M, and 7M-9M, each consisting of chunks at the default block size (eg. 256k)

                                                                  root-3
               root-0                                                       root-1                            root-2
 [0 - 256k), [256k - 512k) ... [Ak - 4M)       [4M - 4M+256k), [4M+256k - 4M+512k) ... [Bk-7M)       [7M - 7M+256k, ..., Ck - 9M)

The idea would be that you'd have a balanced dag up to the intermediate roots root-{0,1,2}, and from there, start reading the data sequentially, fetching all of the blocks under that root.

I suppose the subtree from root-{0,1,2} could also be a trickle dag instead of a balanced dag?

Or, is all of this unnecessarily complex and not needed?

The goal would be to optimize for loading pre-defined ranges sequentially, from a much larger file. Eg, a 1TB file, but we know we want to read a 45MB chunk, from 250MB-295MB, in sequential order.