Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.79k stars 444 forks source link

Channels 3.0 architecture and roadmap: HTML edition #5914

Closed ichorid closed 2 years ago

ichorid commented 3 years ago

Successful experiments with TCP-over-IPv8, QWebView, TiddlyWiki synchronization allows us to build a prototype of the Channels 3.0 system.

The Goal

By adopting this architecture, we will bring all the wealth of Web technologies into Tribler. It will be like jumping directly from times of Gopher, FTP and FidoNet straight into Web 2.0 territory. :goat: :arrow_right: :rocket:

arc1

Architecture components

TiddlyWiki

TiddlyWiki is a one-of-a-kind wiki/application engine that runs entirely in the browser. Most important for us, TiddlyWiki is a microcontent system based on elements called "tiddlers". In TiddlyWiki everything is a tiddler: content entries, pictures, plugins and even the code itself. Every system tiddler can be "shadowed" by a user-provided tiddler, akin to CSS system. This makes TiddlyWiki almost infinitely malleable, enabling a vast library of plugins, CSS styles, etc. (People even create webshops with it). Also, TiddlyWiki has a clear system for backend synchronization.

These two architectural features (backend-free and microcontent) make TiddlyWiki a perfect choice for Channels 3.0 frontend.

TCP-over-IPv8

To enable sending tiddlers of arbitrary size, we have to employ some transport protocol over IPv8. To make stuff as simple as possible, I copied the code for a pedagogical implementation of TCP in pure Python. Initially, I hoped to use the aioquic package, but this proved to be an overkill. I preferred the TCP thing over @qstokkink 's uTP implementation, because:

The code is untested (except by manual test runs). I would like to push it to the main IPv8 repository to enable native transparent sending of messages of arbitrary length through IPv8. I would be glad if @qstokkink helps me with this :wink: UPDATE: done with EVA protocol

QWebView

PyQt allows us to use QWebView widget to, basically, create a platform-independent, Chrome-based browser window inside our GUI. The networking stuff is handled by QT. In regards to XSS attacks, this will be perfectly safe for the user, because it will not use the user's native system browser, hence perfect isolation. Each channel will be served by a separate instance of QWebView.

Libtorrent CDN

Common immutable data, such as the basic TiddlyWiki HTML page, common picture collections, etc, will be transported via torrents, the way we do in Channels 2.0. BitTorrent 2.0 torrent format assigns individual hashes to individual files. This feature allows us to use BitTorrent 2.0 as a CDN to search and host immutable common data. Mutable data will be replicated and gossiped directly with Channels 3.0.

Channels 3.0 community

C3 community purpose is to host and serve mutable data (e.g. forums, wiki pages), as indicated in https://github.com/Tribler/tribler/discussions/5721 . Initially, it will not feature any collective editing. In fact, it will be served as a special sub-type of Channels 2.0 channel, using the same top-level navigation elements in the GUI.

Persistence DB

C3 will use PonyORM as the ORM of choice. Though the semantics will be different, and it will only use disk-based DB as a persistent cache. The idea of C3 is to serve everything from memory, but cache everything and dump the cache to disk periodically. This way, we completely avoid our DB-locking problem.

The Roadmap

Guys! This stuff will require months of work if I do it alone. Fortunately, the work can be split up very nicely into smaller architectural parts. So, if anyone volunteers to help me with this - I can always find a part that will be fun enough for you to make :wink:

synctext commented 3 years ago

As a general comment we need to talk together more in the Dev Meetings about what it means to focus on 1 million users. Are we missing features, stability tooling, polish, marketing, or user interface that we lack users after 15 years and 8 month of trying?

TCP-over-IPv8 I would like to push it to the main IPv8 repository to enable native transparent sending of messages of arbitrary length through IPv8.

This is a fascinating must-have feature in the long-term. Introducing and debugging several new components is hard. Lets focus first on arbitrary length transfers and thumbnails in channels 2.0. Please first finish exploring markdown/hypertext prototyping, then start releasing the first must-have component: arbitrary length transfers + thumbnails. Then focus on crowdsourcing and deploy a pull request mechanism in Tribler for thumbnails. Then subtitles.

Channels 3.0 community

Please keep this pull request and changes as small as possible. Exploratory understanding of markdown/hypertext possible feature is an isolated project. Please keep this separate from improving existing channels 2.0 in terms of scalability, speed and general usability.

By adopting this architecture, we will bring all the wealth of Web technologies into Tribler. It will be like jumping directly from times of Gopher, FTP and FidoNet straight into Web 2.0 territory.

Sorry, we are not a web project, apologies that this somehow got miscommunicated. The Tiddlywiki stuff is impressive, especially the dedicated community that makes plugins. However, that is not what Tribler is about. As a university we can sell DAO stuff we have operational, the time of web stuff has passed. Our engineering is about boring July 2001 stuff: finding and sharing swarms. Plus a ledger that scales. Please keep markdown in scope, as something more light. Just a gentle reminder that a majority of Tribler developers in 2020 discussion voted for markdown, instead of favouring an embedded web browser.

Let pick Friday 19 February as the focus point for a decision with the Tribler team. Try to have a Mac/Deb/Win installer by then on the Tribler discussion forum. Everybody can then test before our 15:00 meeting and give their opinion on next steps.

The process that has been followed is exploratory prototyping and zooming in on TiddlyWiki early on. Nothing wrong with that. Before we deploy anything in Tribler I think it would be wise to also spend an equal amount of time on an alternative markdown-based approach. Measure their CPU and memory usage, check code maturity, and quantify their invasiveness. Then we discuss the way forward.

Latest development: Solidity instead of Javascript? Solidity works now on top of IPv8 in experimental setting, using https://github.com/eth-brownie/brownie :medal_sports: Note: FBase Python based plugins with source code inspection and security rating is still in-scope for 2021+ https://repository.tudelft.nl/islandora/object/uuid%3Ad68197ec-50b9-4452-b9bf-e34a743f165f

Scalability: if hundreds of people view a markdown/hypertext page it should still work. What is their seeding incentive? This is always a primary design concern, but not yet mentioned in the initial design sketch above. This prototype work made me realise the implicit requirement that we have in Tribler about scalability. The latest buzzword for what we termed "unbounded scalability" is hyperscale architecture. One use-case which is now already implemented for our Superapp is around music and cover art. Lot of legal Creative Commons content exists, like a million songs. Key use-case for markdown/hypertext is having 200 cover art albums in a single page along with their magnet links for direct playback. Running code from our Superapp on Android Play Store:

synctext commented 3 years ago

Browser technology is parasitic.

Browser technology is complex, lacks security awareness, and ignores privacy. Browsers only seem technology neutral, this ecosystem is ruled by Big Tech with violations of privacy as the cornerstone of their business. Browser technology is optimised for predatory advertising and surveillance. It is incompatible with the core value of Tribler.

11 years ago, the last industry leader who was pro-privacy described it like this: https://www.youtube.com/watch?v=39iKLwlUqBo After our recent discussion, I'm now formulating it more clearly (and blunt). It extends our "preservation of centrality" rule. Any decentralised technology will degrade after a few years or decades to centralisation, usually at a higher level. Power and capital have a systemic tendency to become more concentrated. Decentralised technology MUST be specifically designed with a defensive moat against re-centralisation. It happened to the distributed email protocol (GMail rule), Bitcoin (mining pool monopolies) and smart contracts (oracle nodes). It must be impossible-by-design for developers use a central server and take destructive shortcuts. Javascript-based technology, Chrome rendering, and hypertext in general are judged too toxic to use. Its too trivial to insert a single central gatekeeper server. As a university we are one of the last places which can be obsessive, principled, steadfast, compulsive, and idealistic. In the essential case of decentralisation we need to take the most academic pure position possible. Without 1 million users and without external developers this is mostly an internal cultural matter. Once the Tribler movement reaches escape velocity this becomes a cultural matter for our 100+ external developers. Resistance to centralisation is important for long-term enduring critical societal infrastructure.

The "no-browser policy" is the core of Tribler.

ichorid commented 3 years ago

Javascript-based technology, Chrome rendering, and hypertext in general are judged too toxic to use.

Judged by whom?

It seems I'm lacking the imagination to see how using JS and HTML rendering of static HTML pages in a sandbox that is completely isolated from the Web (i.e. only allowed to query a limited subset of Tribler REST endpoints) can lead to centralization. Would you kindly provide a detailed scenario?

devos50 commented 3 years ago

It is incompatible with the core value of Tribler.

Depends on how you view it. As you also argued, browser technology, e.g., Chrome, is from a business perspective not compatible with an application such as Tribler that pushes for decentralization. This argument extends to packaging software such as Electron, that is using subsets of Google APIs. At the same time, I don't see many re-centralisation threats by using web protocols such as HTML/CSS.

Decentralised technology MUST be specifically designed with a defensive moat against re-centralisation.

I'm not sure how we can add technological safeguards to our software to defend against re-centralisation. Getting rid of the bootstrap servers is the first thing that comes to mind. The argument to use centralized components is usually convenience (e.g., as of yesterday, a part of my business infrastructure is using the free tier of Cloudflare to reduce the load on my servers and to speed up page requests). How we ensure convenience while still preventing re-centralisation?

qstokkink commented 3 years ago

I don't see many re-centralisation threats by using web protocols such as HTML/CSS.

Agreed and to add to this: this even exists already. ZeroNet is doing this. In fact, relating this to our plugins suggestion of #6019, serving a core API of the P2P internals to Javascript ("plugin") developers also already exists in ZeroNet.

Whether or not this is a core value of Tribler and whether we should even want to compete with ZeroNet is another matter.

synctext commented 3 years ago

Thinking Decentralised

Zeronet is a fascinating example, their site: "TLS encrypted connections", ".bit domain". When you do not alter the ecosystem, you may start re-using old mistakes like DNS and TLS. These are fundamentally central authority technologies. This is their way.

Web people don't think decentralised. What I was trying to convey is something very "soft social science" like. The people from web-technologies may have good intentions and mature tools, but they miss something. This MIT professor writes it down very clearly: https://web.media.mit.edu/~mres/papers/decentralized-modeling.pdf Please browse this document for a bit, web people are never going to get good at thinking decentralised. These 24 pages by MIT explain great stuff like: "randomness plays an important role in creating order in many self-organizing systems". The whole web stack has centralised thinking and privacy leakage at the core. They are right, things are easier with central authorities, master servers, and full control. But users will never have power. Using web technology for decentralisation is an anti-pattern :fearful: To force web people to think decentralised you need to take away everything they know. Give them IPv8 and they understanding "this is different".

using the free tier of Cloudflare to reduce the load on my servers and to speed up page requests). How we ensure convenience while still preventing re-centralisation?

Great point! We can't I think. By deliberately making our tooling incompatible with central tools there is no convenient way to introduce them into our decentral utopia. You always stuck doing it the right ways, the hard way :boom: YAIC :boom: (yet another IPv8 community).

ichorid commented 3 years ago

These 24 pages by MIT explain great stuff like

I've read all the 24 pages (except for the last 2 ofc), thoroughly. The author tells us about his experience of teaching children emergent behaviour in decentralized systems. It explicitly tells the story of how these children would always try to come up with centralized solutions even in the programming language that was specifically designed for decentralization. And how every time he had to hint the pupil of possibilities of decentralized systems...

It is entirely possible to use IPv8 in a centralized way (e.g. by creating one giant channel :wink: everyone is subscribed to). Conversely, it is equally possible to program decentralized stuff with the modern Web stack, see WebTorrent for instance.

Indeed, some tools are better suited for some jobs. IPv8 community is a very nice reification of the concept of a uniform swarm of peers. As is HTML+JS is a very nice reification of the concept of an interactive, easy-to-modify GUI (which QT is not). And that is it! No one is going to put the "client-server" concept from the Web stack into Tribler. In other words, we're only going to use "local-only" parts of the Web stack, completely forbidding any form of remote connectivity or CDN (except for BitTorrent :wink:)

Think about it! Everything is centralized in this world! If we're going to do some decentralized stuff, we're going to do it from centralized stuff, because there is the only thing there is!

You have to make the good out of the bad because that is all you have got to make it out of.” ― Robert Penn Warren, All the King's Men

synctext commented 3 years ago

We seem to reach different conclusions. This is very helpful to align and sharpen my thinking. Let me make it more colourful. Bittorrent is the only island of hope that humanity has for Internet freedom. We're growing this island and never connect it to the systemic corrupt mainland.

Think about it! Everything is centralized in this world! If we're going to do some decentralized stuff, we're going to do it from centralized stuff, because there is the only thing there is!

Exactly, everything is centralised in this world (Bitcoin, Bittorrent and Tribler are really the only long-surviving exceptions). If we're going to do some decentralized stuff, we're going to have to take the harder path, because we are the only thing that is there.

No one is going to put the "client-server" concept from the Web stack into Tribler.

Trust me on this one, it will be polluted. Master-slave thinking is deeply embedded into all Big Tech and web technology. That crowd will never take the harder path and always opt for convenience. Most of our master students even do it. even in the lab that was specifically designed for decentralization

ichorid commented 3 years ago

I understand your point now. It is just one step short of Stallman's email-web daemon. Our different backgrounds, personalities and socioeconomic positions push us towards preferring opposite strategies (though our goal remains the same). This leaves us with a single option of reaching consensus at the tactical level every time, which is elaborate, but workable.

Nonetheless, I got the last question for you: how are you planning to implement the GUI side of the plugins system in a safe way? (PyQT is completely unfit for that purpose. QT is a proprietary technology, there is no sandboxing in Python, etc.)

synctext commented 3 years ago

I firmly refuse to install non-free software or tolerate its installed presence on my computer or on computers set up for me.

From that page... Yeah, its also about living pure!

implement the GUI side of the plugins system in a safe way?

No idea. Its hard

synctext commented 3 years ago

The JCenter Bintray disaster: death due to success

Another example of decentralised culture versus productivity. We relied for the Superapp on a central packaging website solution. A master student trained relentlessly in the art of decentralised system used this free service to complete his thesis mission. Now its terminated. documenting here for the future.

Bintray has provided the open source community a free, universal cloud platform for publishing and distributing binaries. We will have some short service brown-outs to remind users about the services that are going away on May 1st. Screenshot from 2021-05-26 09-19-04 https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/ This has an impact on our science. We are expected to present a working Self-Sovereign Identity solution for all EU member states. Bintray has broken our Superapp. Impact on the SSI demo at our workshop, https://www.enisa.europa.eu/events/workshop-on-blockchain-based-digital-identity-solutions

ichorid commented 3 years ago

The JCenter Bintray disaster: death due to success

Indeed, this is a fascinating example of how relying even on a single centralised service could break a decentralised system!

However, I still don't understand what this has to do with Channels 3.0 architecture and/or using open-source implementation of a HTML renderer to show local replicas of specifically-built HTML pages :shrug: