datdotorg / website

website
https://datdot.org
Apache License 2.0
12 stars 2 forks source link

encoding compatible hypercores #24

Open serapath opened 4 years ago

serapath commented 4 years ago

@todo


RangerMauve commented 4 years ago

Working on this now

RangerMauve commented 4 years ago

Here's some notes from looking at how the replication in hypercore works from the protocol down to the storage.

When we get data from a peer, it gets invoked here, this will be the prompt to intercept the data and encode it / add it to the encoded hypercore

This in turn calls _putBuffer on the unencoded hypercore. This bit of code is super complicated to understand. 😅😅😅 Most of it seems to be checking if there's data that's missing, I guess data that we don't already have locally? Then it invokes _verifyandWrite which does some stuff to check if stuff is valid, and then invokes _write to actually write the data. It seems we can hook into this with the _onwrite hook, though it doesn't seem to do anything. Maybe this is where encoding could be messed with?

Eventually the data for the hypercore is written using the storage instance in the putData method. That will then calculate the offset within the file that it should write the data at and finally invoke write on the random-access-* instance for the data.

A custom random-access-storage thing would work, but we'd need to create the reverse of the dataOffset method where we get the data index given an offset. This might be kinda hard and I'm not sure how the method would become available within the random-access-* instance.

Another option would be to subclass Storage and provide custom putData and getData methods which would proxy to the compressed hypercore. We would need a PR to hypercore to be able to pass in a storage instance instead of creating a new one each time, or we could do a gross hack and monkey-patch the methods in the storage instance of the hypercore after it's been initialized.

RangerMauve commented 4 years ago

Here's a HackMD with some diagrams talking about how the communication between the compressed / uncompressed hypercores could work. https://hackmd.io/uNQsTqDORmOaUD9-48X13w

RangerMauve commented 4 years ago

Pausing for today

RangerMauve commented 4 years ago

Wrote up ideas here: https://hackmd.io/6Wyij7_uTbGxfOSSlJOgZQ

RangerMauve commented 4 years ago

Boom: https://github.com/RangerMauve/intercept-hypercore-storage

This lets us intercept storage events and stuff.

Next I'll work on encoders storing encoding data on hosts in a hypertrie, then I'll work on having the host set up a hypercore which will be intercepted to serve data stored in the hypertrie.