dat-ecosystem-archive / datproject-discussions

a repo for discussions and other non-code organizing stuff [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
65 stars 6 forks source link

Concept: DAT over HTTPs #87

Open martinheidegger opened 5 years ago

martinheidegger commented 5 years ago

@dkastl went to a conference at a university campus and he had to sign a paper stating that "by accessing this network you will not use any p2p software such as WinMX" (we have to google what software that is).

Of course, DAT didn't run on the campus. To my knowledge UDP hole-punching is not yet implemented in DAT, it would practically be p2p file sharing: violating his contract.

Now: This means nobody at the university campus can use DAT, and we have been wondering what reasons the university could have to prevent this.

What came to mind were following theories:

Now, DAT is by no means different from other solutions like Dropbox or Google Drive. but in none of them do I get asked to open a port or the-like because they use http APIs and alike as fallbacks.

There has been the request for a DAT-Gateway to serve the data and work-around this particular problem.

However I would like to propose a different approach: Specify a https protocol to Download AND Share data with a DAT-2-HTTP bridge server.

I imagine the download process to work analogous to the DAT protocol over tcp

<gateway>/<dat-discovery-key>/login

will return a challenge to the client if another peer is in the network, else it will return 404 - as no peer is available.

Next the client uses the public-dat-key to "login" to a swarm. The server asks another peer to verify it, upon verification the server stores the public-dat-key for a few hours together with a session-key that is used for future transactions.

With the session-key its possible to download the content.

<gateway>/<dat-discovery-key>/get/content <gateway>/<dat-discovery-key>/get/feed

Will return the data; Of course with support for ranges (but only in the blocksize of hypercore) The server will then on-demand download the range from the p2p network and deliver it to the client (maybe even caching the data on the server).

This "get" server-architecture should allow for fairly easy clustering.

And about distributing a DAT? The server could of course provide the cached data. but the client could also call:

<gateway>/<dat-discovery-key>/wants

to get a list of ranges that might be wanted by any peer in the network, which the client could then use to push some ranges to:

<gateway>/<dat-discovery-key>/push

Again, the server wouldn't need to cache the data, just see which peers would like that section and transfer it to those peers.

With this sort-of bridge we could have people participating in the DAT network without really being on it?! It would solve our problem.

I mentioned in the title for this issue that its focusing on https. I initially thought http2 might be a good idea, but probably the network infrastructure for this isn't ready either.

bnewbold commented 5 years ago

Some quick notes:

In my opinion, general purpose VPNs are a better and more modular solution to network-specific restrictions than having every protocol implement work-arounds or reduce themselves to the lowest-common-denominator features supported. I acknowledge that VPNs have serious accessibility and user experience issues, but they do work for a non-negligible fraction of users, seem to be considered "too big to fail/block" by most network operators, and i'm optimistic about next-generation implementations like WireGuard.

martinheidegger commented 5 years ago

Dat/hypercore should work fine over TCP, in addition to uTP (which runs on top of UDP)

I honestly forgot that DAT works over tcp. This obviously mutes my theory that "UDP packets can not be filtered as easily as tcp packets." But that being said: for those organizations UDP probably needs to be disabled in the settings by default.

The bigger problem is probably that non-standard tcp ports are blocked inside the organizations: 80 & 443 are well known and traffic flowing through those ports is generally accepted.

sounds like HTTP 3.0 will be based on QUIC, which is a UDP protocol... so maybe there will be pressure for those networks that block UDP to stop doing so in a few years.

HTTP/2 is just about done now and has probably not close to made its way to those universities yet. I assume it will take a little until the firewalls support it - no idea how many years it will take for HTTP/3 to get there.

as a cross-reference (which you might already know), the Dat SLEEP on-disk format was explicitly designed to enable easy read access via HTTP range requests (SLEEP/REST haha!). This doesn't help with state-less gateways (which don't hold the SLEEP content on disk) or bi-directional transfer

Yes, sadly the on-disk format is not a specification on how the HTTP service should work. Though I would say that it is a good base for implementing it.

In my opinion, general purpose VPNs are a better and more modular solution to network-specific restrictions.

VPNs are not an acceptable solution in those networks as well - this is a contract and control issue more than it is a technical issue.

martinheidegger commented 5 years ago

Dat/hypercore should work fine over TCP, in addition to uTP (which runs on top of UDP)

I just read/working on these particular implementations. And while its true that the hypercore transport works over tcp, the DHT communication works over utp (one more random port)

The KRPC protocol is a simple RPC mechanism consisting of bencoded dictionaries sent over UDP.

(source)

martinheidegger commented 5 years ago

I got an additional comment about this:

p2p might be prohibited in networks because the companies do not want communication inside the intranet between computers. If p2p networks are established, there is the chance of manipulation of peers.