exodusproject / spec

(Discussion of) specification of the procotol.
3 stars 0 forks source link

A P2P / distributed networking c-c-c-ombo! #7

Open cuibonobo opened 8 years ago

cuibonobo commented 8 years ago

This model describes a network that allows for both P2P sharing and distributed client/server sharing. It's meant to address the needs of people that don't wish to pay for a domain name or hosting space indefinitely, but still allows for people to sign up for distributed hosting providers for additional features and reliability.

First, a few terms:

This is the most basic level of interaction. Identities and devices are each assigned a unique identification number, and devices are associated with a particular identity via an aggregate link. This allows an identity to serve as an abstraction layer in order to reach a person on individual devices knowing only the identity ID. Devices can be added to a particular identity via pairing.

The distinction between identities and devices also allows an identity to have access to the data on all of its devices (as long as they are visible on the network) without needing to sync the data between them. In other words, you can browse the data on your PC from your phone, and even transfer a file between them, without needing to sync all of your PC data to your phone and vice versa.

Accessing a file on a different device but within your own identity is an internal share. Accessing a file on a different identity is an external share.

Identity IDs can be exchanged via in-person exchanges, specialized images (like 'snapcodes'), a 3rd party channel (like SMS), etc. Once a person has your identity ID, they can send messages directly to your devices, subscribe to public feeds, etc.

To allow for routing via identity ID, the node must register the identity, its devices, and their network locations onto a distributed hash table (DHT). The DHT allows identities to be reached even when they change IP addresses, port numbers, etc.

Note that in P2P-only layer, messages and transfers only happen synchronously: exchange of data can only happen if both parties are online at the same time and remain online for the duration of a transfer.

Distributed layer

In order to allow for asynchronous communication, we introduce the concept of a relay: an always-on server that has a complete copy of the data on a particular device. Relays will sync up from the device (the next time it is available on the network) when data is added, or sync down to the device when messages/transfers from other identities are received.

In essence, a relay is an always-on stand-in for a particular device that is accessible on the internet.

A particular server can be a relay for all of the devices related to an identity, or a person can choose to only make certain devices available on a relay. The most likely scenario is having phones and tablets rely on relays, while PCs and servers are un-relayed, thus, only accessible if they are online.

The relay concept allows for service providers to step in to the space with paid or ad-supported offerings. For instance, if a person signs up for an ad-supported relay, the service provider may show ads embedded in a feed or may introduce an ad interstitial before beginning the transfer of a file.

Besides allowing asynchronous communication, relays can also allow 3rd party applications to access an identity's data. For example, a person can give permission to a blogging application to create new 'blog' files and make them accessible on the web.

Relay services themselves are expected to take over DHT-related operations so devices can save on battery power. They can offer additional services like making data accessible via URL.

Since relay data is meant to be a complete synchronization of the data on a particular device, an identity can choose to switch to a different relay seamlessly.

User Experience / Application Layer

The previous sections described low-level operations, but individual people are not required to know any of that in order to use the network. A typical user interaction would be more like this:

arturovm commented 8 years ago

General proposals for minor changes

I love the work we did today (YAY TEAM), but I'd like to clarify a few of the things I proposed, and also suggest some changes to what is written here. I'll also provide an example scenario to illustrate my points and why I believe we should make these small changes.

Regarding aggregate links: The concept does not refer to the links among your devices themselves, but to the logical link that is established from one identity to another. This will become clearer in the example scenario.

Regarding relays: I do not think they should be a stand-in for one particular device, but rather they should reflect the entire pool of data that you generate on all your devices. Thus, syncing to a relay should not be a granular, per-device decision, but rather, a mandatory mechanism. To control privacy, instead, we have the concept of internal and external shares, marking each resource as accessible only internally, or both internally and externally. I believe this better reflects how we manage our digital lives, and what we expect from a modern personal data store on the Internet (e.g. iCloud, which makes your photos available everywhere, for instance).

Of course, to conserve bandwidth and storage space, devices only "push" to the relay, but don't "pull" the whole pool of data from it. Instead, the relay provides just a listing of the resources available, and devices can choose which ones to pull in particular (like the Dropbox app, as @cuibonobo said).

These subtle changes allow for a great experience for the user: All of their data, available everywhere, regardless of which device it originated from, while conserving storage space and bandwidth where needed (i.e. mobile devices).

Addressing the permissions issue, I think we should do away with the concept of granting permission to relays to make changes. The pairing process should effectively confirm the fact that you own your devices, and since relays are just one more node in the P2P layer, relays are theoretically owned by the user as well. Also, since data is always synced to the relay regardless of its origin device, if one were to write a blog post, it should suffice to post it on one's phone, and marking it as an external share. The blog post would then sync to the user's relay and be available everywhere, both externally and internally.

Regarding user identities: These should be UUIDs, which would be generated whenever a user tells the protocol that this is their first time setting Exodus up, and they should be generated anywhere, not just on a relay. So with this slight change, the User Experience process would look like this:

  1. Download an app from an app store
  2. A device ID is generated (also a UUID, but on a different probability space, unique to the user's collection of devices)
  3. The app presents the user with two buttons (for instance):
    • [This is my first Exodus device]
    • [I'd like to pair this device with my other devices]
  4. If the user taps on the first button, a new UUID is generated for him, which now represents his identity
  5. If the user taps on the second button, this device is associated to his ID

Scenario (diagram included 😛)

Bob has an Android phone, an iPad, and a third-party provider running his relay. Bob takes lots of pictures on his phone, and uses his iPad to write documents and manage his calendar. Bob has three pictures from his last summer vacation on his phone, of which only one is marked as an external share. On his iPad, he's got an important department memo he's been working on, which is an internal share only, and a blog post, which is external.

Alice has an iPhone, but since she's a 1337 h4x0r, she doesn't need no stinking third-party relay. She does everything she needs on her one device. She updates her statuses and she writes her blog posts to plan her uprising against the oppressing regime she's living under. She takes pictures of dirty politicians doing dirty deals, and publishes clandestine voice recordings of her meetings with people of influence.

Bob and Alice meet by chance at a conference showcasing the latest in tinfoil hats, since it's a passion they strangely share. Alice allows Bob to scan her Exodus Code (we need a better name for this), and the process of peering is initiated and completed, after which point, Alice's iPhone has an aggregate link to Bob's devices.

Bob goes home and checks his Exodus feeds. At this point, his iPad tries to contact the last known IP address of Alice's iPhone, but can't reach it, so it queries the all-knowing DHT. Sure enough, the DHT responds with Alice's new IP address, and Bob can then read her latest blog post on Marxism and seizing the means of production. He also sees his friend Dave has posted a picture of his cat stuck in a tiny box. Alice sets her phone to Airplane Mode, and she disappears again, a blip fading back into the shadows in this digital world of ours. "What a fantastic woman," Bob says to himself.

scenario_1

More on aggregate links

These links are just the logical representation of your link to another person. The protocol, as well as the user experience, for all intents and purposes, will treat this link to multiple devices as a single connection. The protocol doesn't care if you have five devices; it's just one link to one person.

At the implementation level, these links would be lists of devices with priorities. Relays would have the highest priority (expressed as "priority 0"), and other devices would have lower priorities. This allows for the protocol to preferentially try relays first, since those are probably always on and have a complete copy of all of the user's data, but in the case of failure, to try other devices the person owns with lower priority.

Perhaps the internal link table could be a collection of entries that look like this:

"abc123": { // user identity
    "ui89dbasdf": { // device ID, relay
        "address": "195.10.85.2",
        "priority": 0
    },
    "ajdfbkw": { // not a relay
        "address": "102.29.95.1",
        "priority": 1
    }
}
Zatnosk commented 8 years ago

I agree with what @cuibonobo has written here, but I have some privacy concerns regarding the function of relays.

Content Filtering

Relays are a great way to ensure availability of content, but I'd them to be as flexible as possible. Forcing relays to contain ALL my data from all my devices is not optimal. There might be relays operated by corporations I don't trust with private data, but still I want to publish and store articles through them. I propose that relays should be either a private relay or a public relay (and maybe a "protected" relay). This has nothing to do with visibility, but only what resources they store.

Private Relay: (My Backup) A device sends every last resource created on it to such a relay, to serve as backup and reliable datastore. Any of my other devices can pull files them the orig. device or from the private relay. It would also backup public resources and probably be able to serve them to the internet. Public Relay: (My Blog) A device only sends resources to this relay when it is publicly shared. This relay could then be my website/blog/channel on the internet, and since it knows nothing about my private and personal files, I could easily allow untrusted companies to host it for me. Protected Relay: (My Sharing Service) I'm not sure if this level is necessary, but it would operate like a public relay, except public and limited resources would be sent to this relay. A limited resource is something shared externally with a select group of identities. An alternative option is having only private and protected relays. (The name is not at all inspired from Object Oriented programming says the PHP dev with an education in Java.)

I prefer the option to have all three levels of relay, with the private relay as the "common default". Alternatively this problem could also be solved through other means than relays, and we could then stick to only having private relays.

Device Privacy

I think it should be possible to only give someone a partial aggregate link, i.e. here's the address of my relay, I'm not telling you what other devices I have - and you can't discover my other devices through the DHT by knowing my relay or my user identity. (This might be given by the DHT, I don't know.) In essense it should be impossible for me to know if Alice (from @ArturoVM's thrilling story) has another device where she posts Doctor Who fanart, if she haven't told me. But she should still be able to share her Doctor Who fanart with Bob through the same identity as her 1337 h4x0r political posts. Is this a relevant concern or is it trivial from the architecture discussed here?

cuibonobo commented 8 years ago

I agree with @Zatnosk in that I don't think a relay should be automatically assigned to every device that belongs to an identity. I think relays should be complete copies so that they are transferable, but also so that they can serve their primary purpose: serving my data if a particular device is offline. Sending my PC data to a relay would probably be more bandwidth and storage than I can pay for.

The discussion brings up a good point that was kind of nagging at me yesterday but I didn't know how to articulate: I should be able to create status posts from my phone or my PC without an obvious difference in UI on the production and consumption sides.

Stepping back a bit, I think that there are 2 distinct use-cases that we are unwittingly merging together:

In short, an identity is distinct from its devices.

In the write-up above I mentioned that P2P-only nodes need to handle DHT operations, but those operations are taken over by a relay if you decide to enable one. I think that this 'master' node also needs to handle identity-specific data. That way you can create a blog post or status update from any device and it is automatically transferred to a kind of magic 'identity bucket'.

What do you think?