How would browsers discover the webrtc-signaling-mesh?

draeder commented 3 years ago

One potential challenge I've been pondering about this solution is, how will a browser discover a node on the mesh to connect to?

Ideally, the entry point to the mesh would be arbitrary, or based on some group mesh identity key or hash. Since browsers can't talk P2P without WebRTC, and WebRTC in the browser requires a signaling server, this seems to create a paradoxical problem on how it would find nodes to use.

Otherwise, it seems the application the browser is using would need to specify a node, which doesn't seem much different than specifying a signaling server to begin with. If that's the case, wouldn't popular applications (or low quality nodes) cause PoW to become too high for that node, therefore causing the same problems we're seeing now with signaling servers? Or is that solved somehow in the beginning because the mesh is distributed at the outset which essentially 'self-replicates' so new browser nodes share the mesh somehow to other new browser nodes?

Something I've thought about creating is an electron app that has a built in local signaling server used to bootstrap itself onto the WebTorrent network. But that still doesn't solve the browser discovery problem.

When you mentioned PoW in your readme, it made me think of blockchain. Is there some kind of solution for node discovery in blockchain?

Or maybe something with this? torrent-discovery

draeder commented 3 years ago

By the way, I've played around a bit with bittorrent-dht, which seems more specific than the bittorrent package for creating / reading DHT for your use case.

chr15m commented 3 years ago

@draeder this is a classic bootstrapping problem in p2p networks. BitTorrent and Bitcoin have this issue too. You can see a hard-coded bootstrap node in the libtorrent source code and there is some discussion online of how Bitcoin nodes bootstrap their initial set of peers.

I'll do something similar to what they do:

Clients will maintain a list of non-bootstrap peers they find.
An initial list of hardcoded known-good nodes in the codebase.

Otherwise, it seems the application the browser is using would need to specify a node, which doesn't seem much different than specifying a signaling server to begin with.

I think the difference is once you are bootstrapped it's much more robust. If the initial bootstrap nodes go down existing clients will be ok as they will use the other nodes they have learned about. Brand new clients would not be able to connect until a new version of the software was released with a new set of bootstrap nodes (but only if every bootstrap node is taken down). If there are sufficient bootstrap nodes hard-coded, it is unlikely they would all go offline at once. This provides a level of robustness sufficient for BitTorrent and Bitcoin so it should be enough for this too.

wouldn't popular applications (or low quality nodes) cause PoW to become too high for that node, therefore causing the same problems we're seeing now with signaling servers?

Yes, so if a bootstrap node is too popular then clients will see the high PoW and avoid it. That's the advantage of using PoW in this way. Clients will distribute themselves across available nodes as they avoid the high-PoW signal that means a node is under strain.

Note that clients do not have to be connected to the same node in order to communicate signaling information as this is passed between any nodes connected to the same hash.

When you mentioned PoW in your readme, it made me think of blockchain. Is there some kind of solution for node discovery in blockchain?

Yeah maybe, but I think "blockchain" is generally ill-suited as a solution to the problem this is trying to solve, The blockchain trade-off is low efficiency and performance for a high degree of security (auditability, reliable conflict resolution). The problem of browser clients discovering each other doesn't really need these properties in the same way as a financial instrument does.

draeder commented 3 years ago

@chr15m Thanks for the detailed explanation. The DNS seed method that Bitcoin uses is very interesting to me. I could imagine writing a service that joins the swarm, downloads the list of known-good signaling nodes and updates DNS for a domain with those records. That domain can then be queried by web applications for bootstrap nodes. And perhaps a way for a node to register it's domain as a DNS seed node with the webrtc-signaling-mesh.

draeder commented 3 years ago

@chr15m Related to my last post re: DNS seeding, I did some experimenting.

What I found is that if a node has a domain proxied by Cloudflare, adding TXT records is extremely fast using the Cloudflare API. Those TXT records can be instantaneously found by browsers using DoHjs by setting the resolver IP to 1.1.1.1.

Although using Cloudflare for DNS might be considered a point of centralization/failure, from a bootstrapping perspective it seems worth consideration.

The nice thing about TXT records is they can have any string value up to 255 bytes, which is perfect for containing node addresses or encrypted strings that can be decrypted by a public key. And, it's as simple as setting the subdomain of those records to one thing, say "ws.example.com". The number of records are substantial for a domain even using the free plan, which is something like 255 total records. Domain root address anonymity can be maintained because TXT record queries do not return anything but the TXT records. So, such a domain could just contain TXT records for the swarm as necessary without revealing its own IP address to queriers (as long as the other records are proxied).

Server (create record):

    let content = "some content"
    fetch(`https://api.cloudflare.com/client/v4/zones/${"Cloudflare domain zone ID"}/dns_records`, {
          method: 'POST',
          headers: {
            "Content-Type": "application/json",
            "X-Auth-Email": '', // Cloudflare email address 
            "X-Auth-Key": '', // Cloudflare auth/global API key
          },
          body: `{"type":"TXT","name":"ws","content":"${content}","ttl":120,"priority":10,"proxied":false}`
    }).then(res => res.json()) // expecting a json response
    .then(json => console.log(json));

Browser (query records):

    // Browser
    const resolver = new doh.DohResolver('https://1.1.1.1/dns-query');
    let trackers = []
    resolver.query('ws.example.com', 'TXT')
        .then(response => {
            response.answers.forEach(ans => trackers.push(ans.data.toString()))
            // handle TXT records / trackers
        })
        .catch(err => console.error(err));

I could see passing the PoW token to TXT records a given "DNS Seeder" node has on the network as a way to help signal to browsers the best node to use to bootstrap onto the P2P network based on such lookups for a known seeder domain--maybe a timestamp, too, to expire old records.

Something like this doesn't necessarily need to be limited to Cloudflare, either, I'm sure..

I'm certain these kinds of ideas have been considered already and weaknesses found, but I want to share what I'm learning nevertheless in case it sparks new ideas..

chr15m commented 3 years ago

Interesting idea, thanks for sharing. All webrtc-signaling-mesh nodes will need to have a domain in the DNS as they are web servers. Somebody could use round-robin DNS to point a single domain name (like bootstrap.webrtcsignaling.com or something) at multiple active servers spread across different web hosts. That's one simple way to have a DNS seed that anybody can participate in without permission.

To begin with I am literally going to have a txt file with one domain per line of known-good bootstrap nodes and have that txt file compiled into the client. What you outline above sounds like something that could be useful once the initial concept is proven to further reduce the dependency on any one source of bootstrap nodes. I didn't know about dohjs, thank you.

draeder commented 3 years ago

Cool -- Yeah, I've been doing a lot of research and finding all kinds of neat P2P toys that have been popping up recently. I'm working on my small version of a 'signaling swarm' with some of the ideas I shared.. One reason I considered encrypting the TXT record was to help obfuscate the physical IP address of people like myself who want to run a node out of their house, but want to minimize the potential of DDoS attacks. It's not foolproof and anyone with some skills could figure out how to decrypt the records the way I have it working now, but it's at least a barrier. And I'm learning a lot and having fun working on it.

Your work with Bugout and with this project has been a huge inspiration for me and has helped me improve my programming skills quite a bit. So, thank you for that!

chr15m commented 3 years ago

:+1: thanks for sharing your research.

chr15m / webrtc-signaling-mesh

How would browsers discover the webrtc-signaling-mesh? #1