dat-ecosystem / dat

:floppy_disk: peer-to-peer sharing & live syncronization of files via command line
https://dat.foundation
BSD 3-Clause "New" or "Revised" License
8.24k stars 449 forks source link

rolling hashes/blob chunking #142

Closed max-mapper closed 8 years ago

max-mapper commented 10 years ago

now we do the naive thing and store entire blobs as 1 file. we should add a blob-list abstraction (e.g. what camlistore/bup does) so we can use a rolling hash (like adler32 or rabin polynomials, whichever can be made faster in js) to invalidate only the blob chunks that change between versions of a blob

jbenet commented 10 years ago

For RF: Make sure to look at the LBFS paper, and the LBFS and Ori implementations. They both use Rabin fingerprints, but make sure to address some of the issues in the model (min/max sizes, etc). I'll be doing this IPFS too so I'll post back here when I do.

max-mapper commented 10 years ago

scala implementation: https://code.google.com/p/fs-c/

jbenet commented 10 years ago

cc @whyrusleeping in case new links pop up here :)

okdistribute commented 9 years ago

@mafintosh what's the status on this one?

mafintosh commented 9 years ago

@karissa we haven't begun to dive into this yet and it's not a beta feature but we should keep this one open as we definitely want to look into this later

okdistribute commented 9 years ago

k

okdistribute commented 9 years ago

@mafintosh will this happen with 1.0 with the fuse bindings?

mafintosh commented 9 years ago

@karissa yea i think so \o/

jbenet commented 9 years ago

we have an implementation of rabin chunking up over at https://github.com/whyrusleeping/chunker we got from a friend

okdistribute commented 8 years ago

A Rabin chunker is now being used in master