amark / gun

An open source cybersecurity protocol for syncing decentralized graph data.
https://gun.eco/docs
Other
18.18k stars 1.16k forks source link

Introduction is unclear #597

Open Anchakor opened 6 years ago

Anchakor commented 6 years ago

I've heard GUN mentioned multiple times, but I never understood what it is. This is my experience when looking at the website and navigating from it. Please take it as a constructive criticism of what seems to me as a promising software project.

I've navigated to the main website https://gun.eco/. The page servers as more like a directory than a landing page. It was sluggish on mobile, maybe it's because of the many embedded videos, but even on desktop browser (FF, linux) it seems to eat all CPU of one core. On mobile the sliding widget (starting with "Realtime") doesn't work, on desktop it is also glitchy, sometimes reverting back to the previous slide.

I have to say I am not a fan of videos or try-it-yourself demos. If I just want to figure out what something is, I want it in short text (I like Wikipedia). Try-it-yourself demos are good as part of getting started tutorial and videos maybe as a supplement to it.

Still not really knowing what GUN is I've found the links on the bottom of the page. "Getting started" redirected me to the same main page, so that wasn't useful.

"Documentation" seems to have gotten me somewhere more useful. The introduction seems like what I'm looking for, but still crucial information is missing.

Still wondering, I navigate to the GitHub page, which provides more puzzle pieces, but still I don't have straight answers to some important questions:

Is GUN a distributed database, like most dist. databases are (Cassandra etc.)? This is more or less answered - no. Seems to me it is a P2P distributed storage like IPFS, with a DB-like interface.

Is there a single shard (world/graph) for all users or each user has it's own? If I add "gun.get('random/nuquDvrWd').put({hello: "world"});" will other users be able to get access it from "random/nuquDvrWd"? Is there a space which can be private for a user/application? How are conflicts handled? If I "gun.get('mark')" can I add information and Mark can't prevent me from doing so? How would Mark filter out this information other people (perhaps maliciously) added?

How is GUN distributed? I take it's distributed such as when you add to the graph the data is replicated to the other P2P network nodes - how, what guarantees are there? If I gun.set something and turn off the device, can other devices still read it? Is there some setting of P2P network nodes what to replicate and what and with what limits? Are clients full P2P network nodes or are there servers, which should fulfill the role? Can I host the servers myself? Can I restrict access to a part of a network for just my application, creating a separate private GUN P2P network?

This is just what came off the top of my head. It would be great if a newcomer could get to an introductory texts answering these questions in minimum number of clicks. Keep up the good work!

PS: Feel free to answer the questions here, I still don't have the answers :)

Dletta commented 6 years ago

@Anchakor Ahoi! I will take a shot at answering. All the docs are open for edit by the public, so feel welcome to make some changes. (Green Button with Edit in the top right of the webpage)

Gun saves data in two ways. Browser - localStorage, Node - RAD (Radix Storage Engine to files). Due to Browser Limitations, not all data is stored on all clients, persistence at this time is what is left in your localstorage and what the 'super peer' (node out of necessity right now) saves. Client subscribe to the data they need to stay informed on and the super peer will dispatch data. Once the data is on the client, the client may serve data to other clients as well.

The graph is shared across all peers in the network. Depending on your data structure you may have multiple worlds available (all worlds can start with a root node, or you could have nodes on separate roots to divide your logic, it's really open that way) Users specifically have their own root object gun.user and through SEA module only the user who is the owner of that root can write to that specific node. All others can read it because they are on a shared public key.

If another user gets "random.nuquDvrWd" the data is requested via gun protocol and he will receive {hello: "world"}.

At the moment there is no private space, although I believe this can be built on top of gun using SEA and encrypting data with another key for the user only. (Crypto is not my metier)

Conflicts are handled by the conflict resolution algorithm https://gun.eco/docs/Hypothetical-Amnesia-Machine (summary: lexical order when exact same time, magic to always converge to latest state with compensation for clock differences (except way more powerful)

Mark cannot prevent you from writing to gun.get('mark'), but if using SEA module you are prevented from writing to gun.user(mark).get('personal info').put('send me money to attacker.bankaccount') SEA is really there to only allow people to write, where they are supposed to. SEA validates incoming items from the wire (websockets) and rejects them if they are not trusted. (https://gun.eco/docs/SEA and https://gun.eco/docs/Security%2C-Authentication%2C-Authorization and some videos on https://gun.eco/docs/Auth)

Gun uses websockets underneath. In the future Gun will have a system called AXE that will do some smart routing and encrypted transports. (Work is in progress) Whenever a client 'puts' something in a graph, a message is sent to the network and other clients that have subscribed to that data, or super peers will pick it up and update their internal state. The guarantee is that it will be real-time for those who are online at the same time and that even those who are offline, will eventually receive those updates. (eventual consistency)

If you set something and the connection was still open to send the msg out to the other peers, even if you go offline, it will be available (esp through the persistent super peer)

Gun handles replication by having the peer/client subscribe to data they need to stay connected to. Depending how you design your application this may be a huge amount of data (killing 50mb of localStorage) or small (which is completely fine). IndexedDB could be used and there is an adapter you can use for that, I have not needed it so far. (super peers have only the hard disk space as a limit)

Due to browser limitations above, node js is used as a super peer to handle connections and to persist data on hard disks / AWS S3 / IPFS.

The code is fully open source and you can host your own server in minutes (less than 10 lines of code). The restrict part is not something I can answer solidly. My educated guess is, you can hack into the websocket module to only allow trusted IPs to connect, to a gun instance. Or you use SEA user with special key that you only give to the people of that specific network.

I hope that helps. I know @amark and @beachbrake are planning on a documentation overhaul to get things a little clearer/simpler.

(feel free to join the gitter.im chat for amark/gun, people are always super helpful, also ask anything here)

amark commented 6 years ago

Sorry I haven't had time to reply. Thanks @fuchidahiro for jumping in!

I whole heartedly agree docs need ton of improvement and I am very thankful for these observation and criticism.

I haven't actually gotten to read your reply yet @fuchidahiro but it looks awesome from my skim and maybe we can take both your comments and the questions and synthesize them into the docs?

Dletta commented 6 years ago

@amark, let's do that. I think it be good to do a FAQ section maybe and slap the most common questions (some of which are here) in there?

Anchakor commented 6 years ago

Thanks for the answers, this clears up a lot of it!

I think the most confusing thing is that (if I understand it right) there is just one global P2P graph data structure, which is actually flattened JSON (are you familiar with flattened JSON-LD?) and the authentication-authorization story. How are you supposed to build a basic CRUD application if anyone can write to the graph[1]. Seems that if you gun.get something you get a node and outgoing edges and their values (actually a bit more - JSON of everything not flattened) and you can secure that with SEA (hopefully any node), but you probably cannot get or secure edges pointing to the node.

[1] Actually I don't understand how come the https://gun.eco/docs/Hello-World doesn't read the value input by other users - or isn't GUN just one global P2P graph data struture, but multiple based on the server which coordinates clients (browsers) and there are actually multiple GUN P2P networks?

Dletta commented 6 years ago

@Anchakor I think I get what you are saying.

Let me change my answer from earlier. When you create an instance of gun with var gun = GUN(). you create one graph. Any client or peer connection to that instance via the websockets, will write and get from that instance. But it is not global outside of the instance. (as in, there is no global network of gun, that you could go into and read data from or to, it is still restricted to the network within which the app lives.) When you make an app and have your gun instance. You will not automatically see other apps data. But if you create your gun instance and you add a peer of another app, like notabug.io. You can then request data from notabug/things/... and get back actual data from that app. You would not be able to write to it, though, because notabug runs a validation code to check that their schema is filled in and that it comes from a valid source. (something you could do too)

Anchakor commented 6 years ago

@Dletta I see, thanks. I was quite confused, Gun docs talk about P2P and in FAQ it says:

Data is kept and shared between peers, across the network

I think usually P2P software has just 1 global network, connecting all clients through DHT or so. Also from FAQ:

Is there a single graph across all users in the network?

The graph is shared across all peers in the network. Depending on data structure you may have multiple graphs available. All graphs can start with one root node or multiple root nodes to divide app logic.

This is very unclear, as an outsider I can't imagine what is meant by the different data structures and root nodes (root nodes are usually a tree concept, not a graph concept).

In any case I think prospective users would be very much interested in ways of securing the network of their applications, so for example rogue browser extension doesn't wreak havoc in the network. I assume by:

notabug runs a validation code to check that their schema

You mean it uses the SEA module for that, or is that a different security mechanism?

Dletta commented 6 years ago

@Anchakor I changed the explanation in the FAQ, please have a look to see if that is clearer.

The validation is on the 'wire' level, intercepting messages from other gun instances as they come in. https://github.com/zrrrzzt/bullet-catcher is an example of how to do that.

SEA is for signing, verifying and encryption for transport and private conversations.

Gun is a library, which is able to be used for many different things. At this point it is not a P2P app, but rather a library or framework to create P2P apps. Gun is the protocol that can take graph data and send it from one instance / peer to another.

A root node is the first node you create in your graph or it may be a bunch of nodes, which you then connect to other nodes. (tree structure or pure graph) This is more a choice of how people may structure their data and what they need to query for.

For example, I created a twitter graph builder, that listened to tweets and then create a graph for all the words used in each tweet at each timestamp and in what place. My root nodes were 'tweetText', 'time', 'words', 'place'. Which allowed me to just say gun.get('place').map().once(printList) and it would return all places a tweet came from. The graph itself was a completely connected graph, but the "rootNodes" let me query from that specific direction.

That is also how data seperation may work in apps. One client may only be interested in the words used in the graph, so they will use gun.get('words').on(doSomething), which will sync to their instance only items under ('words') node. The super peer may also have all the times and places connected to those words and a peer may get them if needed by using .once, but needs not retain them in their own graph all the time. Does that explain what I mean with dividing app logic?

Anchakor commented 6 years ago

@Dletta OK I think I understand it now, lets see, does this description of Gun make sense:

Gun is a library for P2P distributed data storage with graph structure. It allows building P2P storage networks where each peer listens to the network traffic and controls with which other peers [1] it synchronizes with, what changes it accepts and persists, what queries it responds to. Gun uses a conflict resolution algorithm favoring offline-first availability and persistence from the CAP theorem [2]

[1] Only super-peers can be synchronized to, they must be explicitely specified for non-super-peers [2] There are no configuration options for a different CAP arrangement

If that seems useful, feel free to adjust it and use it :) In any case that seems to have cleared up my questions, thanks!

PS: Maybe also consider adding a docs page comparing it to other projects, like IPFS, Neo4j, Apache Cassandra, PouchDB/CouchDB, LevelGraph

Dletta commented 6 years ago

@Anchakor Thank you. I think this explanation is fine. I believe that in the coming months some of the browser limitations will be lifted through the AXE module, Gun is working on. So I will probably remove note 1. The last sentence is not quite right, the conflict resolution is focused on A and P, but still favors strong eventual consistency at the price of delayed consistency (but eventual consistency). But anyone can plug into gun and use their own conflict resolution algorithm. (it's all modular) or they could build a schema on top, that pushes items in a different way from peer to peer.

Edit: By Mark. (creepy GitHub lets me edit your comments)

mk-pmb commented 6 years ago

I'd like a comparison to PouchDB as well. There are modules that claim they can sync PouchDB via WebRTC so it looks like they could work server-less.

Dletta commented 6 years ago

@mk-pmb Just adding here. webRTC is not server-less. It still requires a public signaling server to allow two browsers to connect. PouchDB looks interesting. They say it uses indexedDB underneath which allows the browser to use a great amount of storage. (If the user allows it it will fill the hard disk) GUN just added a module to use indexedDB in the browser as well. Pouch doesn't seem to support non-JSON data, but it includes a way to store data blobs. Gun could store datablobs, but it is recommended to use it for the metadata and use ipfs, webtorrent etc for the actual bigger size, since it is not a priority to get big data saved.

bkniffler commented 5 years ago

First of all, let me say that this thread and @Dletta answers were super helpful to understanding how gun works and what it does. I'm all for basing new docs on that info, since the current ones are a bit hard to understand.

I was wondering, how would one create a multi tenant application with data sharing between tenants? Would I create multiple super nodes (one for each client)? How would I go and share data then, especially if the data is encrypted under a user? Or would I be using something like bullet-catcher, but modified to restrict read+write access according to the users token?

Just to be clear, it would be an application that has confidential data separated for each tenant while some data needs to be shared between tenants for read+write.

flybayer commented 5 years ago

In addition to @bkniffler's request, I'm also wondering how to design a system where each user can only read and write to their own data. No data is ever shared between users. What would the graph look like? How to secure all data? (every read/write needs encrypted with SEA?)

Dletta commented 5 years ago

@flybayer Hi, SEA enables write protection to the users data. But to keep it from being read, you would most likely encrypt all your own data to your own graph with your 'private key'. Which you can then use to decrypt upon read. Doing that, which could be done in a custom chain .securePut you would have to write, would then disallow anyone from reading any other users data, even if they figure out the you can access other users with their pubkey.

Does that help?

mk-pmb commented 5 years ago

Maybe you can run a Tahoe-LAFS on top of Gun.