Proposal: Extension Message to notify about hypercore keys

RangerMauve commented 6 years ago

Recently, there's been work around hypercore-protocol-proxy which is making use of hypercore-protocol to replicate feeds from a gateway and for multicasting data.

It works great for feeds that gateways already know about, but the protocol is limited in that you can't start replicating keys that both peers don't already know about.

One use-case that I'm really interested in is public gateways that allow peers to connect to them and replicate any key with the gateway automatically connecting to the discovery-swarm to do replication.

I propose a hypercore extension message that will notify the other party about a "related" key.

This will be used by clients to notify the gateway about a key before attempting to send the "feed" message.

I think we'll need to bikeshed a bunch about the actual details, but I think it would look something like:

The extension name would be "related-feed"

The message contents would be a protobuf that looks like:


enum RelatedFeedAction {
REQUEST = 1; // Ask for the key to be loaded on the remote
READY = 2; // Answer to REQUEST, means that it's ready for the "feed" message
REFUSE = 2; // Answer to REQUEST, means that the gateway won't replicate that key for some reason
}

message RelatedFeed { required RelatedFeedAction action = 1; optional bytes key = 2; }


- The client would use connect to the gateway using websockets, but it should work with any duplex stream
- The client will have a high level API for `relateFeed(key, cb)` by their key.
- The steps for getting a feed are:
  1. Send a RelatedFeed `REQUEST` action for the key to the gateway
  2. Wait for a RelatedFeed event back from the gateway for the same key
  3. If the action is `REFUSE` call the CB with an error
  4. If the action is `READY` call the CB without an error
  5. The client should then send a "feed" event using whatever means. (probably by replicating the hypercore? Not sure how this part should work, actually)

I think that having a standard DEP for this will make it easier to have people deploy gateways that can be reused between applications.

max-mapper commented 6 years ago

@mafintosh and I had talked a while back about supporting a zero knowledge proxy that would fulfill a similar case as Tor relays or TURN servers do... meaning you could run one to donate your bandwidth but you as the operator would not know what dat keys are being shared, as the entire connection would be e2e encrypted and metadata is obscured from you. Is this use case in scope here?

RangerMauve commented 6 years ago

a zero knowledge proxy that would fulfill a similar case as Tor relays or TURN servers do

We were talking about similar stuff on IRC, actually.

The stuff I'm proposing here would be making use of the existing hypercore-protocol, so it would need to have access to the feed contents. This would still workw ith the second approach to zero-knowledge.

There's two approaches for gateways:

What you described with zero-knowledge relays, @fsteff and I have been wroking on it with discovery-swarm-stream. It acts as a proxy for discovery-swarm so it only knows about discovery keys and doesn't know about the actual keys or what the contents are. It can only do the same level of MITM as an ISP could.

The other approach is to encrypt the contents of the hypercore and use the existing replication protocol with peers and make use of a hypercore-proxy. Basically, you give something like hashbase the URL for your archive to replicate it's contents, but the actual content of the hypercores is encrypted, so it can't do anything with it. Then, peers fetching the data will have the encryption key as part of the Dat URL so they will have full read access. Thus you can use this proposal with the existing hypercore-protocol, but prevent gateways from knowing what's really in the dats. This of course only works for encrypted dat archives and the first approach is more "simple".

pfrazee commented 6 years ago

If I were to run this proxy service, I'd probably want authentication so that I could meter the bandwidth usage. You have any thoughts on that?

RangerMauve commented 6 years ago

I think that the authentication and the such could be tacked on at the transport level.

If the standard for these services makes use of websockets (which it really should), it'd be trivial to add BasicAuth on top of them.

If you have a service with metering, you could have people connect using clientName("ws://sometoken@gateway.example.com")

I like basic auth because it's the lest effort for developers to implement if they have an existing WS implementation. They can pass in a URL and not worry about setting headers or anything else.

This also decouples the genreation of any sort of token from the use of the token. This lets services define whatever authentication and token generation that they want.

Having websockets would also make it easy to load-balance between services using something like nginx. Websockets are super important here, too, in order to support the web (which is going to be a big use case for gateways).

Another reason it'd be useful to have it in the WS URL is that we don't need to modify the hypercore protocol further. A service can determine whether they want to accept a connection without having to bother handshaking or any other processing.

bnewbold commented 6 years ago

Would it be reasonable to (ab)use the pinning service API for this? https://www.datprotocol.com/deps/0003-http-pinning-service-api/

To reduce the number of pinning API messages sent, a "client' could use pinning to establish a first hypercore session with the proxy, then announce additional feed public keys via that "write" mechanism.

RangerMauve commented 6 years ago

Would it be reasonable to (ab)use the pinning service API for this?

I think that having the information be part of the hypercore-protocol would be more useful for supporting different environments. Having something out-of-band in addition to the hypercore-protocol seems a little messy and wouldn't work for cases where you don't have an HTTP pinning service handy.

On a different note, could somebody point me to anything describing how I'd use an extension message with hyperdrive so I could give this a go?

pfrazee commented 6 years ago

@RangerMauve https://github.com/beakerbrowser/dat-ephemeral-ext-msg might be a good reference

RangerMauve commented 5 years ago

By the way, the reason I didn't progress on this is that I was getting skeptical of giving knowledge of public keys to gateways.

That's why I took the approach of proxying entire connections through the gateway instead of hypercore-protocol in discovery-swarm-stream. This is mostly a concern for public gateways that you don't necessarily want to trust with your data, so it might not be relevant to other use cases.

RangerMauve commented 5 years ago

I'm gonna try to do something regarding pinning next week. Might have time to look at this.

RangerMauve commented 5 years ago

So now that pinning is ready, I'd like to revisit this.

Ideally I'd like to have the following properties:

Works over existing hypercore replication stream
Can verify that the list hasn't been tampered with
Can be used with Dat data structures that use different configurations of feeds without needing extra code
Can handle reasonable numbers of feeds without worrying about perf (<= 128?)

Here's how I'm thinking of approaching it:

Sends list of keys using a custom message at beginning of replication
Message is signed by the hypercore private key to ensure validity
New method on hypercore to addRelatedFeed(key, cb)
New method on hypercore to getRelatedFeeds((err, feeds) => void) and on('related', (keys) => void)
Message gets persisted to hypercore storage in a file for later replication
When replicating a feed that we got from the message, it can specify more feeds
For something like HyperDB, when you replicate the main feed, it will emit a message of all the feeds it authorized, and when replicating those, they will emit the feeds that they authorized.
This should be added to Hyperdrive and HyperDB, and storage providers like Hashbase.

Any comments on gaps in this plan or any better ideas on how to achieve this?

CC @pfrazee @mafintosh @noffle @tinchoz49

RangerMauve commented 5 years ago

My main motivation for this is to standardize storage further so that storage providers can be more generic, so that application developers can do crazy things with hypercores, and so that users will always be able to choose their storage providers without worrying about who supports which data structure.

fsteff commented 5 years ago

A possibly huge problem here is spam. If an app does load parts of the received feeds, this easily turns out as a DoS attack, especially for services like hashbase (!).

RangerMauve commented 5 years ago

Could you elaborate on that?

I'd imagine you'd have the same bandwidth / storage quotas that you'd see in a large hyperdrive.

Plus, the scenario you describe could still happen with a regular multiwriter hyperdrive (once they're out), it's just that Hashbase would need to parse the contents of the hyperdrive rather than plainly replicate hypercores without looking at what's inside.

okdistribute commented 4 years ago

I think https://github.com/kappa-db/multifeed is doing this, and there are more and more applications using this standard.

dat-ecosystem-archive / DEPs

Proposal: Extension Message to notify about hypercore keys #38