MatrixAI / Polykey

Polykey Core Library
https://polykey.com
GNU General Public License v3.0
29 stars 4 forks source link

Gestalt Synchronisation with Gossip Protocol #190

Closed CMCDragonkai closed 3 months ago

CMCDragonkai commented 3 years ago

This explains gossip protocols

Within our gestalt, we should be synchronising:

These can be eventually consistent.

CMCDragonkai commented 3 years ago

This eventually provides #185

CMCDragonkai commented 2 years ago

First library to trial out here is automerge. Especially the issue here regarding persistence: https://github.com/automerge/automerge/issues/331.

Remember our state is actually persistent, it's not just an in-memory dictionary, so it looks like we will need to hook into our DB changes or at a higher level.

Note that I don't believe automerge is a form of gossiping. It's a point to point CRDT. So we will need to see how this works with gossiping in an eventually consistent way. Consider that all the nodes in a gestalt is a multi-party situation.

Another example is https://github.com/yjs/yjs that I've heard good things about. It's important to trial these out.

CMCDragonkai commented 2 years ago

All the vaults in all the nodes in the same gestalt should be "scannable". That is if nodes are in the same gestalt, then it should be possible to pull vaults from any node.

Scannability means discoverability. Right now one has to interrogate a particular node to know what vaults they have.

Unlike DHT key-values, the vaults are not stored via hash key that balances them among all the nodes. Instead they are stored loosely, there may be some vaults in node 1, some other vaults in node 2. This is because PK nodes are not entirely "homogenous". We cannot expect all nodes to store all the same vaults.

However some vaults can be selected for automatic synchronisation. At least point to point, sort of like configuring a mirroring protocol. In GitLab we can pick an upstream to mirror a repository from. We can choose to push or pull. This same protocol which can work across gestalts, should also within a gestalt.

I'm wondering now if we ever make use of DHTs at all? Perhaps the only thing we would use DHTs for is for publically shareable and trustable key-value data across the entire PK network (across all gestalts). I can only imagine gestalt discovery data would be useful in this situation, it would make use of decentralised indexing in order to scale out the indexing/crawling of identity providers to build a "map" of the trust network in the world. Of course one does not care about the full trust network, just the immediate trust network, but if some of this trust network information can be shared, then that can improve performance. Maybe there will be more usecases for distributing trust network crawling that we will discover in the future?

CMCDragonkai commented 2 years ago

Backchannel system uses Automerge/CRDT to synchronise this metadata: https://www.inkandswitch.com/backchannel/#linking-multiple-devices.

Because we are doing more than 1 to 1 synchronisation, I believe we would need to combine CRDTs with gossip protocol to spread that information across the gestalt.

CMCDragonkai commented 2 years ago

Should investigate whether Hypercore protocol can be used here.

Note that hyperswarm is being researched for #234 for improving our networking layer.

But hypercore, hypertrie and hyperdrive is the underlying P2P data structures that they are attempting to use. The hypercore appears to be an append-only log similar to how we are doing the sigchain. But the same concept could be useful for the gestalt graph and ACL data structures #185.

CMCDragonkai commented 2 years ago

After reading about https://dusk.network/news/kadcast-vs-gossip and https://eprint.iacr.org/2019/876.pdf I have realised that gossip protocols are unstructured p2p networks that have the intention of distributing messages to eventually nodes in the network. They are eventually consistent and somewhat inefficient in that redundant communications will happen.

Gossip makes sense as a protocol to distribute gestalt graph state among trusted nodes https://stackoverflow.com/a/50493510/582917. Gossip is also used in most blockchain networks to distribute blockchain state.

A DHT on the other hand like Kademlia and architectures before it provides a "structured" p2p network. This is intended to provide some efficiency as nodes share the workload of routing and communication.

A combination of gossip and DHT is possible such as the case of kadcast, which uses kademlia's structure and attempts to design a broadcast protocol on top that is capable of distributing state across the entire network.

Bittorrent has the http://bittorrent.org/beps/bep_0050.html which creates a sort of subnetwork which you can then do gossip/broadcast within that subnetwork to only distribute state across a subnetwork and not the whole network. And this looks very similar to the nodes within a gestalt in our gestalt graph.

CMCDragonkai commented 2 years ago

Important to avoid flooding the network to avoid sending useless data across the network.

CMCDragonkai commented 3 months ago

Closing in favour of https://github.com/MatrixAI/Polykey/issues/715 because this issue is not up to spec anymore.