Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.83k stars 447 forks source link

Trolley - a high-level persistence and communication layer which allows multiple plugins works together #6019

Closed kozlovsky closed 2 months ago

kozlovsky commented 3 years ago

By introducing plugins architecture, Tribler transforms into a platform that allows multiple plugins/applications to communicate and process data in a coordinated way.

Some examples of possible plugins are Torrent, Chat, Meme spreader, Torrent Popularity plugins. Plugins can communicate with other local plugins and with remote nodes.

Internally Tribler uses IPv8 as a protocol for communication between nodes. However IPv8-based API might be too low-level to be directly used by plugins. Also, IPv8 is unsuitable for communication between plugins living within the same node. To solve these, we need the communication API to enable plugins to:

API design requirements

  1. it should be high-level and straightforward;
  2. plugins running within the same node must be isolated from each other;
  3. the global state of a specific node must be reproducible, to enable debugging tricky inter-plugin issues.

TROLley (TRibler Object Layer) design will answer those requirements. For the plugins system, it will serve as both the persistence layer and the communication bus. In Trolley, the Tribler node state is represented as a set of signed data objects belonging to corresponding IPv8 communities. A data object has two kinds of properties: properties common to all Trolley objects (e.g. author public key), and properties defined by a particular IPv8 community (schemaless JSON). A plugin can have a dedicated local database with an arbitrary structure. Still, this plugin-specific database is for local storage only - to propagate data to other nodes, you need to put them into the Trolley.

Plugins communication with Trolley

When a plugin creates a new object, Trolley propagates this object to connected nodes belonging to the same community. Creating a new data object is the only way for a plugin to send a message to other nodes. By default, Trolley propagates new data objects to every connected node within the same community. A plugin can specify different kinds of propagation strategies for different types of objects.

A plugin can subscribe to get updates from specific communities. When a node receives a new data object, each plugin subscribed to that community will process it in turn. As a result of processing, a plugin can create new data objects, and Trolley will automatically propagate them to other nodes in the same way. Also, the plugin can put data into its private database. Basically, Trolley enables a distributed pipeline of object processing by plugins.

The order in which multiple plugins process the same data object is not strictly specified. It is possible to specify a set of other plugin names in the plugin configuration that should process incoming data objects only after the current plugin (dependency loops are forbidden).

Authors, objects, and versions

The content inside the Trolley is signed. An author is identified by their public key. A data object is identified by uuid. A data object state has a specific version identified by a timestamp. A particular version of an object is immutable, which is guaranteed by signing it with the author's cryptography key. Only the author can produce a new version of the object. It is impossible to pass around objects modified by someone but the author. Nodes usually store only the latest known version of an object.

Object deletion is processed using a tombstone for the object. It is possible to "undelete" the data object by generating a new version of the object that is not a tombstone.

Data channels

A data channel object is a collection of items with incrementally increasing ids (so the item's unique id is the data channel's id + item number). It is unnecessary for a node with a data channel object to having all its items. For example, a node can have items [1,2,7,8,9] of a specific data channel.

When a node subscribes to a channel, it starts receiving its items from nearby nodes subscribed to the channel, starting from the most recently created items.

A channel has two numbers locally associated with it on a specific node - (lo, hi). They represent a range of items with the highest numbers without holes - in the example above will be (7, 9), as the current node misses item 6. When Trolley sends channel items to a nearby node with this channel, it:

  1. stores received items that were unknown before
  2. updates (lo, hi) range
  3. sends back items that extend the (lo, hi) range of the other node.

This way, nodes can incrementally synchronize channel content starting from the most recent items.

ichorid commented 3 years ago

So, Trolley will basically serve as the GCD for Channels, Bami, messaging, forums, trust, etc. All these systems have one thing in common: they need to remotely reconcile sets of objects published by a third party. The way to find a partner for reconciliation is different for each appliance. E.g. in Channels 3.0, partners are semi-persistent and served based on the channel index, while for a messaging service that should be a direct connection, etc. However, Trolley must provide rich options for reconciliation, like Bami's Backbone does:

So, the most important method provided by Trolley should be something like: reconcile(peer, objects_list).

synctext commented 3 years ago

Great brainstorm for future work! This is really helpful to get our thoughts, ideas, priorities and roadmap figured out. Bit of work left to do before this (popularity community fixing, keyword search with relevance ranking, and no spam protection or reputation function).

By default, Trolley propagates new data objects to every connected node within the same community. A plugin can specify different kinds of propagation strategies for different types of objects.

Lots of overlap with this unresolved issue, #3690 "trustworthy gossip". @kozlovsky co-assigning you to that open issue. Please read this work from 2013: https://d2k0ddhflgrk1i.cloudfront.net/EWI/Over%20de%20faculteit/Afdelingen/Software%20Technology/Distributed%20Systems/Technical%20Reports/2013/PDS-2013-002.pdf Dispersy implemented this idea, but we discovered you first need reputations to do this. Diving into 10 year old code, or from 18 July 2011 to be exact also for my learnings: https://github.com/Tribler/tribler/blob/f42b6ea9277f37505e66442121490cedbbb8acb1/Tribler/Core/dispersy/distribution.py Its hard to get this right, nobody figured this out yet. Great scientific future goal. Note that we might never find a generic method, just custom code for each IPv8 community and data object.

synctext commented 2 months ago

Closing this issue, it aims to be a generic middleware function.

This repeats historic mistakes such as CORBA middleware. The Tribler Object Layer does not fix an open issue. We first need to have spam-resilient and fake identity resiliency in Tribler. Plus stable code. Only then can we re-visit these ideas.