dat-ecosystem-archive / datproject-discussions

a repo for discussions and other non-code organizing stuff [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
65 stars 6 forks source link

Doesn't DAT Need Official HTTPS Gateway? #86

Open ilyaigpetrov opened 6 years ago

ilyaigpetrov commented 6 years ago

IPFS has a https-gateway used as https://ipfs.io/ipfs/\<hash\>. The existence of such gateway gives the following benefits:

  1. Search engine crawlers may access these pages.
  2. Users may access ipfs/dat pages without ipfs installed.
  3. If access to the pages is blocked by censors then: 1) pages should still appear in search results. 2) users may still install ipfs and ipfs-companion browser extension to access these pages.

@RangerMauve has implemented a http(s) gateway based on pfrazee/dat-gateway working in a way similar to ipfs (repo, demo). It even redirects each page to a subdomain to insulate cookies and provide better security.

Concerning datbase.org — I couldn't get it to serve me a html page the right way (without the header and maybe with absolute css paths which is not critical).

I kindly ask the DAT team to take a view on the work @RangerMauve and similar works and provide users with an official DAT https gateway, which we may use to build our future censorship-resistant websites.

RangerMauve commented 6 years ago

One thing I'd also like to note is that I'm working on having dat-gateway automatically inject a DatArchive polyfill so that sites that make use of it can work without any extensions.

joehand commented 6 years ago

Hey! Been really great to see the work @RangerMauve and others are doing on gateway and related work. You are right on the benefits of having the gateway. While an official gateway can be helpful it could also lead to copyright & legal issues beyond the resources of our nonprofit. We'd like to keep the scope of the Dat Project focused on the core technology to ensure sustainability in the long-term, and using resources otherwise may detract from that.

It's really great to see the community efforts around this and we'll continue to support them however we can.

Concerning datbase.org...

Dat Base was intended a registry not necessarily a gateway and I'm not sure we could cover all uses without doing subdomains.

RangerMauve commented 5 years ago

Hi, I'd like to hijack this issue to talk about gateways and the such.

There was a meeting with some people from the Dat community to talk about gateways and getting Dat working in the browser on Wednesday Feb 27. Here are the meeting notes (courtesy of @substack)

```markdown # dat in browsers notes 2019-02-27 # on the call * diana: working on dat gateway * franz: working on archipel * mauve: working on webrtc/websockets/discovery-swarm-stream * substack: working on peermaps # franz: working on archipel archipel is an electron app using an rpc layer over websockets hyperstack module: hyperdrive and hypergraph, but probably extensible same apis for using dat apis on the browser or the server goal: allow the same backend code to run in a webworker in the browser # webrtc/gateway/signaling mauve: got webrtc working on signalhub prioritize webrtc connections by setting a delay to start using websockets establishes a websocket connection too after some seconds signalhub is great for a single swarm swarm per key in a browser is very limiting discovery-swarm-stream: rpc over websockets for the discovery-swarm api mux several streams through websockets with dat gateway, dat key gets leaked to the gateway: privacy issue applications shouldn't necessarily share data with 3rd party gateways does discovery-swarm-stream make sense? act as a signaling server over the same protocol? franz: many use-cases for browser/server storage/processing splits easy to use rpc api for easier frontend apps browser vs server has different trade-offs for storage/network etc substack: peermaps is browser-first p2p for maps using iframes to handle cross-domain maps? mauve: gozala is working on lunet using service workers paired with iframes to serve ipfs or will proxy to local ipfs daemon substack: how do custom extensions work with gateway/signaling/rpc work? mauve: gateway probably wouldn't work with custom extensions, but custom extensions over discovery-swarm-stream should work. franz: with hyperstack, the backend has the different implementations how to make it run in the browser? looking at lunet, different replication possibilities for gateways etc mauve: do we really need shared storage across origins? what about iframe post messages if you have different apps with different data, do you need to necessarily coordinate on different origins? seems complicated substack: running tools like lunet makes sense for seeding from the browser and managing connections from a place/single interface ```

We also spoke about this on Thursday the 28th at the dat comm-comm call.

The gist of it is that it'd be nice to find a standard way of doing dat stuff in the browser.

The main pieces that I see (feel free to add more) are:

I looked at some this stuff a while ago when I was working on dat-polyfill and again recently when working on dat-js.

I propose meeting on Wednesday February the 6th at 20:00 GMT to discuss this stuff and maybe start putting parts together. We could use audio-only setups with https://talky.io/dat-in-browsers to talk about it.

Personally, I'd like to work on combining the signalhub / discovery-swarm-stream code so that we could support replicating multiple hyperdrives through both WebRTC and proxying to the discovery-swarm all with a single websocket connection. (Also integrating hyperswarm once that stabilizes)

Does that time work for you all? Is there a better date or time? Any other items that I could add to the list of stuff to talk about?

CC @garbados @substack @karissa @frando @dpaez @tinchoz49 @gozala

RangerMauve commented 5 years ago

Also, I'm making a calendar invite. Email me at ranger@mauve.moe if you'd like to be added to the calendar.

ghost commented 5 years ago

I think a good way of accomplishing this is to publish a set of capabilities to the peer table when joining a swarm. So for example if I am a browser I can communicate as a websocket-client and webrtc-peer. And if I'm an electron app I can communicate as a websocket-client, websocket-server (if I have a public IP or can hole-punch), tcp client, tcp server (if I have a public IP or can hole-punch), udp etc. Peers could also publish their connection preferences to the table. Then to make a peer connection, clients can consult this table along with their own heuristics to make the best connection possible according to some mutually acceptable preferences.

These kind of hybrid swarms would be very useful for merging what would otherwise be fairly separate networks based on transport protocols.

Apologies if this is already the plan, although if so I guess this comment will help to disambiguate.

RangerMauve commented 5 years ago

Yeah, that's a great idea! One thing I was thinking of is that it'd be cool if these gateway servers published their existence to the discovery swarm under a know key. Then you could connect to one and discover more through it and potentially save them for later.

okdistribute commented 5 years ago

@pvh o/

RangerMauve commented 5 years ago

Re: the capabilities. We should discuss (on the call) how to do this stuff without reimplementing libp2p. 😅

Gozala commented 5 years ago

Re: the capabilities. We should discuss (on the call) how to do this stuff without reimplementing libp2p. 😅

Is there reason to not collaborate on libp2p itself?

Reading this thread I was feeling: oh that’s exactly goal of libp2p, which also happens to have rust implementation so you could in theory wasm it.

pvh commented 5 years ago

I think for the same reason it's healthy to have both KDE and Gnome, or Mozilla and Firefox, or Linux and FreeBSD, it's not wise to create a monoculture.

The dat community and the IPFS community have different ethos, technical goals, funding models, development methodology and values. I think both should inspire and be inspired by each other and drive one another to improve but it doesn't make much sense to me for the dat community to adopt the IPFS codebase.

pvh commented 5 years ago

As for web gateways, if you've been following the work Ink & Switch has been doing, we've been discussing something morally along the lines of the DatArchive injection (though I think we envision a quite different actual implementation) to extend our system to non-Electron computers like iPhones and browsers.

Roughly, because a first-order goal for us to is to support totally offline usage we've discussed bridging hypermerge repositories over a websocket gateway but also wrapping all of that magic in a PWA that kept the data in localStorage (or something) for improved durability.

RangerMauve commented 5 years ago

@pvh Would you be interested in attending the call?

Also ping @sammacbeth. He's using gateway stuff in https://github.com/cliqz-oss/dat-webext

Gozala commented 5 years ago

I think for the same reason it's healthy to have both KDE and Gnome, or Mozilla and Firefox, or Linux and FreeBSD, it's not wise to create a monoculture.

The dat community and the IPFS community have different ethos, technical goals, funding models, development methodology and values. I think both should inspire and be inspired by each other and drive one another to improve but it doesn't make much sense to me for the dat community to adopt the IPFS codebase.

I think there are few caveats here that is worth considering:

Please note that does not imply that:

I apologize for derailing this conversation, it just saddens me that instead of making greater progress towards decentralization communities across the board choose to keep reinventing the same wheel which slight technical differences. It could be that overhead of coordination across groups would have higher overhead than the value to be gained by that, but that's rarely argument.

pvh commented 5 years ago

@Gozala you're right, we should discuss this in one of the many other channels we share :) @RangerMauve i'd be interested in joining the conversation, though mostly to listen since we haven't done too much here yet.

RangerMauve commented 5 years ago

@pvh Cool, feel free to join in, and send me an email if you'd like to be added to the calendar event.

cblgh commented 5 years ago

@RangerMauve i'd be interested in joining too. i'll mostly listen in and maybe fold in ideas that come to mind as the call progresses. I sent you an email :^)

ghost commented 5 years ago

@gozala I've discussed this elsewhere but I think this:

The dat community and the IPFS community have different [...] development methodology and values

is a huge reason why there isn't more interop. I look at something like this code example and I see a wall of configuration that is written in an unfamiliar style and appears to have no practical purpose. It sets up a huge amount of boilerplate and then... you have a Node object? It doesn't explain what anything is for. I mostly see walls of text, tables, badges, org charts, and nothing means anything to me.

Compare this to something like webrtc-swarm. You set up the module with 2 pieces of information and then you can listen for 'peer' events which give you a bidirectional stream. The module doesn't overload you with a manifesto first, it gets out of your way. I can easily see whether a module like webrtc-swarm will solve my problem or not and it doesn't try to solve all the world's problems.

The other development methodology for api design, technical communication, and setting scope leaves me unmotivated to even figure out if the given module will be suitable for what I'm trying to do. I also have no idea what libp2p is doing without reading a book's worth of content, but I can approximately guess how a module like webrtc-swarm works by glancing at its interface. Creating a mental model for the layers that sit below what you're working on is very important to design around the correct set of trade-offs, performance considerations, and failure cases. I also worry with tools that are too configurable about the tendency for those abstractions to leak upward in ways that push against encapsulation.

RangerMauve commented 5 years ago

Yeah, I like what libp2p are trying to do, but I don't think this would be the best place to try to integrate it with Dat. I think it'd be better to talk about that somewhere relating to the work in hyperswarm since that's where all the new networking stuff in Dat is going on.

My goal of bringing it up was to figure out a scope that we should focus on and avoid over-engineering.

If down the line there's more adoption of libp2p in the Dat ecosystem, then that will definitely affect the browser, but I'd rather start somewhere small so we can help people experiment with web applications that use Dat.

RangerMauve commented 5 years ago

Ping! The call should be starting in a minute or so. :D

pvh commented 5 years ago

https://github.com/inkandswitch/discovery-cloud-client

RangerMauve commented 5 years ago

Thank you all for coming out to the call! I found it really helpful to learn about your different experiences with this stuff and the use cases that you're aiming for.

Here are the notes I took during the meeting, feel free to add comments on the post for anything I missed:

## Participants - Diego from Geut, Made a dat blog post using hyperdb for multiwriter - cblgh, Alex, working on Cabal, interested in getting Cabal on the web - Margin, tinchoz49, also Geut, discovery-swarm-webrtc - Gozala, Irakli, working on libdweb, prototypes lunet, IPFS node in the browser using ServiceWorkers / iframes - Kaotikus, Scot, Bits.coop, Interested in learning more about the protocol - pvh, Peter, automerge project, using Dat and WebRTC for distributed applications, interested in getting non-electron things working - substack (no audio) ## Notes - discovery-swarm-stream is useful - discovery-swarm-cloud creates a local swarm on a server - random-access-idb is really slow, we need a new approach. - There's a proprietary IndexdDB extension in Firefox that could help - IPFS doesn't have the same performance issues as dat, it uses IDB for block storage - Might be useful to store individual ranges in IDB instead of files - webrtc-swarm has issues with reconnecting after going offline for a while, Digeo and martin are looking to do a PR if they find a fix - webrtc performance sucks with multiple connections, it's led to pvh giving up on using it,Chrome might be working on improving this internally, @gozala says Mozilla are trying to optimize it - WebRTC doesn't work in workers, this is manes processing is done on the main thread which doesn't scale well. Gozala has used hacks for pumping data into workers - Latency when putting things into workers is bad, probably a result of IPC latency according to Chrome developers - pvh: Dat in browsers is a fallback, ideally focus on Electron apps first - Diego: There's a lot of potential and a lot of unknowns in browsers - Gozala: We shouldn't compete with different P2P protocols, we should all compete against the web together. Browsers are useful because links are enough to share content - pvh: Shouldn't push browser nodes too much because the experience is more complicated and not P2P - gozala: We should entice developers to use the tech so that Browsers will eventually support P2P APIs - martin: Could this discovery stuff go into hyperswarm? it's hard to track all the modules in the ecosystem - cblgh: What is the exact issue with DataChannel performance? Maybe contact feross for details about WebRTC in the browser - gozala: It'd be nice to have one app that handles all P2P protocols so that users wouldn't be asked to install it all the time

Here are some action items from the meeting:

I'm going to get started on the the discovery-swarm-stream stuff mid next week with the goal of getting it integrated with dat-js and having someone test it outside of hyperdrive replication.

RangerMauve commented 5 years ago

Re: random-access-storage. Would WebSQL perform better than IDB? @pfrazee You're using sqlite for storing dat data, any opinions regarding using it as a backend for hyperdrive?

tinchoz49 commented 5 years ago

Storage

We experimented with different random-access-* in the browser:

random-access-idb

It was the first idea but we have problems with reading and writing a lot of blocks from hypercore.

random-access-key-value with level-js

It works great at the begin but then starts to slow down when you have >=50 blocks.

import raf from 'random-access-chrome-file'
import randomAccessKeyValue from 'random-access-key-value'
import leveljs from 'level-js'
import levelup from 'levelup'

const db = levelup(leveljs('dbname'))
const storage = file => randomAccessKeyValue(db, file);

random-access-chrome-file

It works really well, we don't find performance issues but don't want to have only support from chrome.

Network

We are using webrtc through discovery-swarm-webrtc and having different issues like unusual disconnections.

We are trying to stabilize the connection :disappointed: and one of the issues that we found is related to signalhubws.

If the ws client loses the connection it doesn't try to reconnect, so we did a fork of signalhubws to use sockette and fix this kind of issues: https://github.com/geut/signalhubws

Probably we are going to do a PR to the original project and discuss the changes there.

pfrazee commented 5 years ago

@tinchoz49 Appreciate you sharing that research. I'm fairly sure that Chrome is pushing for their files APIs to become standard. It might be a good bet in the long run.

@RangerMauve It's worth taking a look at, to be sure.

RangerMauve commented 5 years ago

@tinchoz49 Have you tried out discovery-swarm-stream yet?

okdistribute commented 5 years ago

The chrome files api is really nice and fast, I also recommend using it despite its dependency on Chrome. I hope it becomes standard! We are using it with a map tile downloader for mapeo.

okdistribute commented 5 years ago

My (somewhat limited) take on web sql is that it might make sense for metadata lookup, but could be heavy for file storage with lots of blocks.

pvh commented 5 years ago

Ah yeah, I believe we funded random-access-chrome-file. Glad to hear it works in Chrome and doesn't require a Chrome App specific API. :)

tinchoz49 commented 5 years ago

@tinchoz49 Have you tried out discovery-swarm-stream yet?

It's our first priority for tomorrow.

I'm fairly sure that Chrome is pushing for their files APIs to become standard. It might be a good bet in the long run.

The chrome files api is really nice and fast, I also recommend using it despite its dependency on Chrome. I hope it becomes standard! We are using it with a map tile downloader for mapeo.

That is really interesting thanks for sharing your experience. Right now we are building a demo for the next edcon and we need to have working the browser storage persistence for that day. So, I'm going to talk about using random-access-chrome-file with the team tomorrow.

sammacbeth commented 5 years ago

Some thoughts from my side:

pfrazee commented 5 years ago

This leads to one of the core issues - the official Dat networking stack is not web-compatible. Until this client can directly communicate with web peers centralisation will be required to bridge web swarms with node ones. Does the current roadmap for Dat-node consider this issue? AFAIK hyperswarm is similarly tied to requiring direct TCP and UDP socket access.

This is the Web platform's issue, not Dat's. We can't solve it without new Web APIs.

pvh commented 5 years ago

This leads to one of the core issues - the official Dat networking stack is not web-compatible. Until this client can directly communicate with web peers centralisation will be required to bridge web swarms with node ones. Does the current roadmap for Dat-node consider this issue? AFAIK hyperswarm is similarly tied to requiring direct TCP and UDP socket access.

This is the Web platform's issue, not Dat's. We can't solve it without new Web APIs.

I've spoken with several folks at the Chrome team about this. Chrome Apps have access to raw UDP/TCP sockets (with some bugs/caveats) and they expressed intent to provide similar APIs for the platform in the future. (I confess I am somewhat dubious about this but it's better than nothing.)

I should note that the Chrome networking APIs were frustratingly not-quite-compatible with node ecosystem libraries and it caused significant friction. In particular we ran into problems such as where socket configuration had to be run at different times than in Node (I believe Chrome required up-front configuration and Node didn't support the configuration until after a connection was established?) and several frustrating but more minor bugs with things like bugs in the support for multicast preventing us from using mDNS successfully.

Also, thanks to @Gozala who has been working to solve these problems on the Mozilla side. It doesn't sound like his effort will result in a new standard at this point but he's certainly helped raise awareness and driven some progress on that front.

tinchoz49 commented 5 years ago

I have been using random-access-idb-mutable-file in dat-webext now for a while with no issues. I have not tested performance, but it feels subjectively better than random-access-idb.

Maybe we can think to build something like a "random-access-web-file" for dat-js that use the Chrome file system api in Chrome or chromium based like Brave and IDBMutableFile in Firefox.

Gozala commented 5 years ago

I have been using random-access-idb-mutable-file in dat-webext now for a while with no issues. I have not tested performance, but it feels subjectively better than random-access-idb.

Maybe we can think to build something like a "random-access-web-file" for dat-js that use the Chrome file system api in Chrome or chromium based like Brave and IDBMutableFile in Firefox.

That is what I was suggesting on the call yesterday. If that works it's sounds like an easy wind.

In long term however I think Dat community needs to either:

  1. Pursue browser vendors to provide storage mechanism that fits random-access-store design
  2. Redesign storage layer to fit constraints of the storage mechanism provided by browsers

I'm biased towards no 2 mostly because I know how difficult it is to make progress on the browser end, not because they don't care, combination of billions of users & complex, old codebase makes you evaluate things on different merits.

Gozala commented 5 years ago

Also, thanks to @Gozala who has been working to solve these problems on the Mozilla side. It doesn't sound like his effort will result in a new standard at this point but he's certainly helped raise awareness and driven some progress on that front.

Thank you for the kind words @pvh I do however want to point out that I don't think adding TCP / UDP / MDNS to the web stack is desired outcome, if browsers do it that is signal that they gave up on the web platform.

I really don't want ads start discovering local network services through mdns or trying to print ads

I think desired outcome IMO looks more like beaker or farm where browser allows applications to read / write data into some namespace and takes care of the underlying networking on user behalf.

pvh commented 5 years ago

Also, thanks to @Gozala who has been working to solve these problems on the Mozilla side. It doesn't sound like his effort will result in a new standard at this point but he's certainly helped raise awareness and driven some progress on that front.

Thank you for the kind words @pvh I do however want to point out that I don't think adding TCP / UDP / MDNS to the web stack is desired outcome, if browsers do it that is signal that they gave up on the web platform.

Can you expand on why you feel this way? I think of this as being similar to the emergence of wasm and fast canvas implementations. Essentially, it enables the unbundling of browser functionality and empowers communities to innovate without relying on browser vendors to do the work.

Essentially I am advocating for browsers to follow the advice of the Atlantis paper and let folks innovate not just on programming & rendering models but also communication & network protocols.

I really don't want ads start discovering local network services through mdns or trying to print ads

Same! I would expect this kind of behaviour would look a lot like notifications or any other kind of explicit opt-in browser feature.

I think desired outcome IMO looks more like beaker or farm where browser allows applications to read / write data into some namespace and takes care of the underlying networking on user behalf.

I think we're still years from knowing what the protocols should be, let alone standardizing them across vendors. If we have to wait for the answers to those questions before we can start testing these technologies at non-enthusiast scales I think that's a huge obstacle.

Still, I don't plan to wait around. I'll keep working on Electron apps & gateways and hoping browsers catch up someday.

Gozala commented 5 years ago

Can you expand on why you feel this way? I think of this as being similar to the emergence of wasm and fast canvas implementations. Essentially, it enables the unbundling of browser functionality and empowers communities to innovate without relying on browser vendors to do the work.

Both wasm and canvas are fully contained in the sandboxed, in fact wasm is even more so than JS. Today the major struggle on the browser end is to to somehow contain data leakage fueling data economy of the web. Adding more low level IO primitives will only make that problem far worse and more difficult to address.

Essentially I am advocating for browsers to follow the advice of the Atlantis paper and let folks innovate not just on programming & rendering models but also communication & network protocols.

I have not read the paper, will do and maybe it convinces me otherwise.

Same! I would expect this kind of behaviour would look a lot like notifications or any other kind of explicit opt-in browser feature.

Yeah but numerous studies have shown that prompts do not work, majority of users will click through whatever prompts you provide to just get through. In fact notification prompts had being overly abused.

In practice user prompts only works if that is rare instance, but I'm certain it will be a pandoras box just like notifications prompts. And if some shitty site user needs to visit does not work and insists on UDP socket many users would accept the risk and reinforce the behavior.

Another argument is if every single site rolls out own p2p protocol and own data storage layer etc etc.. we end up with silos just of different kind. Diversity is good but I'd argue interop is more important. I really don't want the same mess as we have with endless messaging apps where each of your contact is on a different one.

I think we're still years from knowing what the protocols should be, let alone standardizing them across vendors. If we have to wait for the answers to those questions before we can start testing these technologies at non-enthusiast scales I think that's a huge obstacle.

I agree, however I think there should be a middle ground between testing by opening UDP sockets on arbitrary web site and waiting for browsers to standardize. I though extensions might provide a space to do so. More recently I've being thinking that some companion app (flash player of p2p web) could be a more effective way to do so.

Still, I don't plan to wait around. I'll keep working on Electron apps & gateways and hoping browsers catch up someday.

I really hope so! Browsers isn't where innovation happens but rather standardization of the innovation that has happened

RangerMauve commented 5 years ago

Hi all, I've just released discovery-swarm-web which acts as a proxy to discovery-swarm. With this it should be possible to find peers for any hypercore based application you want.

With regards to storage, I've made random-access-web which will automatically try to use the Chrome File API, or IDBMutableFile API if they're present and fall back to idb and random-access-memory.

We're also finishing up a new release of dat-js that uses these two modules. Combined, I think it makes dat-js feel a lot faster.

@allain has been looking into getting Dat running in react-native using nodejs-mobile an regular RN, so part of that exploration made it more obvious why an HTTP API for Dat would be useful. Essentially, there was a local gateway (running dat-gateway for now) which was going to be used to render content to a webview. However, we also wanted to support the DatArchive API for Beaker apps. It would have been nice to have a standard for doing that. At some point I'd like to implement something and combine it with the local discovery swarm stream server that comes with discovery-swarm-web. Maybe also paired with a standard pinning service.

@tinchoz49 from @geut is working on a new version of discovery-swarm-webrtc that should have higher reliability and hopefully overall better performance. We're also thinking about how to reduce the overall number of connections to avoid some of the performance issues of WebRTC connections.

pvh commented 5 years ago

Hi! I'll try this tomorrow. Thanks for sharing!