Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.86k stars 450 forks source link

lessons learned over past 15 years #143

Open synctext opened 11 years ago

synctext commented 11 years ago

Goal: publish about issue tracking, unit tests, software repository, continous integration and test frameworks

Publish at this industry track conference where they accept 'experience reports describing problems (and their solutions) encountered in real applications'.

Notes ToDo process

Possible storyline: evolving slowly from experimental prototyping (science-first phase) to production-level code (users-first phase). Team altered over the years from scientists towards scientific software developers.

Berkeley DB towards SQLite (one year struggle). Support of swarmplayer, plugins and social networking (2008).

156 tickets from P2P-Next project, we tried but did not work yet.

Cleanup of the core May 2013:

4th generation file sharing: the 10 years journey towards 1 million users

synctext commented 5 years ago

Document the Bartercast pivot.

Dispersy history and evolution.

ABN/TKI/ShadowI/Tranwall

Struggle of proxies from 2006 onwards.

Finding external talent, just teach/make them ourselves

P2P seminar, hacking lab, blockchain.

Leaving Android, MythTV and Kodi

All code replaced!

synctext commented 5 years ago

We have made continuous improvement our cardinal organisational capability.

Relentless incremental improvement is difficult to sustain within a university environment. It is defined by strong pressure to publish elegant ideas. Publications that are reproducible and based on real-world complex problem are resource intensive. Conducting reproducible science or actually solving problems can easily take 10x or 100x more time when compared to publishing ideas. Trying to solve a problem is a risky career path. To fund our scientific work we successfully tapped into a reliable revenue stream. Roughly 35% of our time is spend helping our government solve some of their most challenging ICT problems. Those problems have been selected to be closely aligned with our research interests. For instance, passport-grade online authentication.

In 2005 we published the following conclusion, after measuring Bittorrent for a few years:

One of our main conclusions is that within P2P systems a tension exists between
availability, which is improved when there are no global components, and data integrity,
which benefits from centralization.

The final sentences of our 2005 article mentions our work on distributed accounting systems:

Another future design challenge for P2P file sharing is creating incentives to seed.
For example, peers that seed files should be given preference to barter for other files.

14 years have passed since we started working on distributed accounting systems (now called blockchain or utility tokens). After relentless incremental improvements our online token economy within Tribler is nearly complete. Tribler still evolves. We recently replaced 50,000 lines of code with 5000 lines.

Tribler has grown or even evolved into a complex system. Is is not top-down designed, that is becoming impossible. After nearly 20 years of building systems with special properties, only successful design patterns survive. Inspired by Darwin himself, human-made systems often follow the same design principles governing natural systems. The key design principles is evolution by natural selection.

When do we call it https://en.wikipedia.org/wiki/Continual_improvement_process and when is it natural selection? We speculatively deploy semi-random features and see if they stick. Our decentral social network deployed a decade ago failed. Our wiki-style trust-less editing failed in 1999, but might finally catch on.

Our long-term direction is published in February 2006. We republish it here in-full:

our social-based P2P network, TRIBLER, addresses
all five grand challenges.
The most difficult research challenge is the decentralization
of the functionality of a P2P system across the various peers.
Full decentralization eliminates the need for central elements
in the system, which must be set up and maintained by some
party and which may form serious bottlenecks, point of failures, 
or security threats. In particular, connecting to the
network and validating accounts are difficult to implement
without any central element. To date, no P2P file-sharing
system exists which fully decentralizes all functionality efficiently
and without loss of integrity. Social groups form a
natural method to efficiently decentralize P2P systems, due
to the fact that communication is mostly localized amongst
group members.

The second challenge is to guarantee the availability of
a P2P system as a whole. The operation of such a system
should not depend on the availability of any particular participating
peer, or of any central component, if the latter exists.
Given the short availability of peers (in [14] we found less
than 4% of the peers to have an uptime of over 10 hours),
the availability problem is critical. Proven social incentives
such as awards and social recognition could stimulate users
to leave their P2P software running for longer periods, thus
improving the overall availability of the network.

The third challenge is to maintain the integrity of the system
and to achieve trust amongst peers. By definition, P2P systems
use donated resources. However, donors cannot always
be trusted, and maintaining system integrity has proven to be
difficult in operational systems [7]. Data can be attacked at
several levels in a P2P system, namely system information
(e.g., pointers to content), meta data, and the actual content
itself. This significant problem, often ignored by P2P system
designers, can be solved with a social-based network,
in which users actively help to clean polluted data and users
can select trustworthy representatives.

The performance of a P2P system highly depends on peers
donating resources. Even though the resource economy is
by definition balanced (e.g., every MByte downloaded corresponds
to a MByte uploaded), autonomous peers are free to
decide whether to donate resources or not. Hence, providing
proper incentives is vital to induce cooperation and to achieve
good performance [3]. Again, social recognition can help to
alleviate this problem.

The fifth challenge in P2P systems is to achieve network
transparency by solving the problems caused by dynamic IP
addresses, NAT boxes, and firewalls. The fundamentals of
the Internet have changed due to the wide-spread use of these
three technologies. Peers no longer have the freedom to send
anything anywhere, without the help of another peer acting
as a mediator between them. Social networks enable communicating
peers to automatically select trusted mediators
from the members of their social proximity, who are still
online; hence, the need for fixed mediators is eliminated.

After several millions downloads of Tribler and growing user community we are also becoming more ambitious. With the successful growth of our Youtube-like system our ambition level is to solve the problem of online trust. We believe our trust-inducing mechanism can be applied in most online social platforms and economic settings. We believe social proximity is suitable to become a cornerstone of the online world.

synctext commented 5 years ago

draft storyline

Permissionless innovation

Delft University of technology has created highly disruptive pioneering innovations within social media, medical domain, finance, and identity. Our first permissionless innovation is a passport-grade identity and web-of-trust ecosystem. We empower each citizen to own their own digital identity. Self-sovereign identity is a much discussed topic in some circles, but not yet achieved in the real world. Our prototyping and relentless incremental improvement work has matured this emerging field. Within our trial we accomplished the first essential act of true self-sovereignty in the digital domain. A democratic government which recognizes the public key chosen by a citizen as their digital identity and their legally binding electronic signature. We created an permissionless open banking infrastructure. We did not ask permission of the largest banks of Europe to create this open standard and operational system. Our transparent infrastructure offers real-time payments across Europe and competitive currency conversion. Our truly open banking infrastructure enables further optimisation and automation of business logic in many parts of our economy, without the cost and delay induced by countless lawyers. In April 2005 we created the first video streaming platform which is designed to spreading videos with privacy and security. Social capital and integrated reputation mechanisms are key to spreading videos of demonstrations and protest. Protesters are not required to ask governments permission to spread videos of protests, prevent fake news, and promote local news footage. The medical domain is part of our society which is highly regulated. Internet marketplaces and direct-to-patient shipping of medication may bypass some of these checks and balances. With the arrival of cheap whole-genome sequencing you can now download your own DNA profile. With the right inputs software is now capable of detection disease risks, without any oversight or permission from medical professionals. In close collaboration with medical ethicists we are creating machine learning algorithms which guarantee privacy, are GDPR-compliant, and can determine medical risks. We are actively working on achieving the first software-only direct-to-patient medical trial. Scientists make software updates available for continuously running and incrementally improving detectors of disease risks. We believe patient-initiated medical trials represent a new phenomenon. By putting the patient fully in charge we are creating a permissionless ecosystem for medical research. We believe our four permissionless innovation ecosystems are key drivers for superior innovation spaces. We are now evolving our complex systems constructed over the past 15 years into a single ecosystem with permissionless flow of information and value between leaderless organisations of unbounded size.

synctext commented 4 years ago

Scientific principle behind our work on identity, trust, cooperation, and trade:

Our approach to science is:

synctext commented 4 years ago

For 7 years we have been developing credit mining. Goal is to automate donating resources by identifying under-seeded swarms (need more replicas) and joining them. This is a key step towards an autonomous Youtube-like system without servers. First research result was Investment Strategies for Credit-Based P2P Communities in 2013. First paper to describe the Bandwidth Investment Problem. image

Credit mining is an example of our pathfinding methodology. During our 20 years of deploying self-organising systems we always deploy a conservative and partial system to learn and build experience. Deployment and feedback from the real-world has proven to be the most effective methodology, versus the getting it right the first time. Tribler is complex, involving interaction among and feedback between many parts. It consists of planning for the unexpected, avoiding partial failure and steady step-wise improvement. It evolved, it is not designed. We first measured NAT hardware, for instance, before devising UDP puncture techniques. The richest man on earth is also a fan, #GradatimFerociter.

After a master thesis on credit mining this feature still did not meet expectations. It was hard to use within the GUI and ineffective. After 7 years of effort the code was deleted. More research is needed. Critical missing element is a fully functional popularity community. That is known to be a hard problem, its part of building a Google-like search engine.

We don't do much formal scheduled meeting and work breakdown structure. Formal meetings are not appreciated by lab members and minimised. Alignment is done through coffee and slack. ToDo: explain the 1 person, 1 project, and 1 Github issue method (1=1=1 method).

synctext commented 4 years ago

"Epic Sprint", after 15 years and 5 months of Tribler development we are trying a new process.

"Epic Sprint" is a new agile work methodology to cope with our growing development team and expanding code base. Having weekly meetings about all the various development tasks is boring for most people, better to have smaller team meets. How to divide our overall work? With three developers @xoriole @kozlovsky @drew2a we turning our Distributed Google dream into real running code within 6 months hopefully. This means getting big things done in small a unit, or Bezos rule of sharing two pizzas. (given @drew2a experience lets make him responsible for the agile process). After this 6 months we can do another epic sprint, turning an open scientific problem into a solution which is verified by observation and experience. Working in a small team increases speed of development and peer-review of protocol designs, when compared to our prior lone-developer approach. In January 2021 we will evaluate and improve.

xoriole commented 4 years ago

Tribler Startup Model - Sandip's thoughts

Tribler should evolve from being an academic project to bring the software to millions of users. For that, we Tribler team should consider it a startup with limited scarce resources and relentlessly focus on user growth. Month-over-month growth rate is a good metric to understand how we’re doing over time. A million users is always going to be a distant dream if there is no obsession for growth.

Tribler’s unique value proposition is its anonymity which kind of works but it still needs to provide its users the same level of content quality, performance and usability as the centralized counterparts. Therefore, the entire focus of the dev team for the coming months (or years) should be to develop features that actually solves the pain points or problems of the users. Only then, we can reach a broader mass besides a niche of privacy aware users.

To achieve that, a few points that we should always consider in our development sprint:

  1. Acquisition: Is this feature or fix going to attract new users?
  2. Churn: Is this feature or fix going to reduce user churn (leaving Tribler)?
  3. Retention: Is this feature or fix going to improve user experience and keep providing value to the users? Improve content or performance?

Therefore, for any feature, we should consider the concept of minimum viable product/feature. We test the feature first, then if it is contributing in either of the above three growth measures, we should double down and put more resources to further develop it. Otherwise discontinue swiftly and move to the next feature cycle. Measurement of the appropriate growth or usability metrics becomes key here. It is also the responsibility of the feature developer to develop the mechanism to measure the metrics that indicates success.

Next, having a fixed release cycle is important. Great teams deliver on the promise they make to their users. The release could be monthly, quarterly or half yearly or annually, but defining an interval and following faithfully builds trust of the users on the dev team. Besides, predictability helps users on deciding when to upgrade and to what version.

...

synctext commented 4 years ago

Nobel Price winner in economics Paul Romer his research links together: permissionless innovation with Big Tech monopolies and the health of The Commons. First is the classical 1989 model for innovation and knowledge as a nonrival good. Outcome of this simple model: the equilibrium is one with monopolistic competition.

image

This equation captures two substantive assumptions and two functional form
assumptions. The first substantive assumption is that devoting more human capital to
research leads to a higher rate of production of new designs. The second is that the higher
is the total stock of designs and knowledge, the higher is the productivity of an engineer
working in the research sector

Recently he promoted the idea of Big Tech and the Commons in New York Times

It is the job of government to prevent a tragedy of the commons. That includes the commons of shared
values and norms on which democracy depends. The dominant digital platform companies, including
Facebook and Google, make their profits using business models that erode this commons. They have
created a haven for dangerous misinformation and hate speech that has undermined trust in democratic
institutions.

Conclusion: in 2005 we published the need to rewards seeders for their efforts and started incrementally improving our deployed mechanism. We did not understand how fundamental and difficult our task was. We where quite naive. As of 2020, people see that due to the pandemic we need a strong government, rich Commons and additional digital regulation.

A mechanism to address the Tragedy of the Commons such as indirect reciprocity or network reciprocity would also enable democratic institutions on a global scale. Once we solve the problem of strong digital identities and secure voting it is possible to create decision making processes to democratically control the flow of any amount of money by a community of unbounded size. The founding of the "Global Democratic Commons" might actually be possible in coming decades.

synctext commented 3 years ago

Ideas from the past and old insights have been poorly documented. The lab often re-discovers them without knowing how much "those ancients" already knew. Tribler is getting old. Really old. The idea of Dispersy, IPv8, Allchannels and today channels 2.0 is: 6,487 days old; 17 years, 9 months, 3 days ago.

The project was called THE GOD FILE. It was a strange idea to package .torrent files inside a torrent. It would scale to millions of users. How could you make changes? Well yes, difficult. That feature is now done after 17 years. Crowdsourcing is probably taking 18 years since this first operational prototype:

Understanding incentives and freeriding. Kazaa measurements from 1 - 3rd of April 2003. Download the original measurement capture from 2003 measurement_556_downloads_overview_5downloaders

Our first user retention measurement in 2003, seeding duration in Bittorrent. Original captured data sample from December 2003 and beyond:

847192  24.114.x.x  6881    2003_12_23__19_20   2004_01_02__14_40
810297  24.43.x.x   6882    2003_12_27__07_57   2004_01_05__17_02
773323  82.66.x.x   6882    2004_01_03__07_52   2004_01_12__06_41

Specific displayed capture shows 9364 users. Each of these users is ranked by their continued usage of Bittorrent in seeding mode. The most loyal user is displayed on the left, using Bittorrent for a few weeks continuously.

Operational Merkle hashes inside Bittorrent 29 May 2006. After this work by The Ancients it was idle for 10 years.

Lesson: start documenting these high-level lessons. Either here or in the docs. Single comment in one of 6000 Github issues and a graph in one of our 100+ scientific publications might be forgotten if it can't be found with keyword search. Somebody might want to re-produce your work 17 years, 9 months, 3 days later. Avoid nostalgia, nobody cares.

synctext commented 3 years ago

Gossip protocols are special. Evolution, optimisations, generational improvements and tuning are required; however, simplicity must be maintained. Experiments have show that normal humans have a systematic bias to add complexity: https://www.nature.com/articles/s41586-021-03380-y Subtraction needs to be trained. Simplicity needs to be supported/pushed by management.

qstokkink commented 3 years ago

ModerationCast (2008): 10.1.1.485.3316.pdf

{edit: 2007 work: https://www.semanticscholar.org/paper/TriblerShare-%3A-A-Scalable-P-2-P-Based-Web-2-.-0-Werf/3a592b0795e0899a0fc88e11b883f048c49548bb with Youtube,Liveleak and Flickr browser}

synctext commented 2 years ago

Creating simple systems is surprisingly difficult.

More on the culture of engineering versus self-assembly. IPv8 gossip-based communities are not based on typical engineering methodology: piece-by-piece design; instead, they are build using evolution and emergence. We have primitive code since: 8 July 2003 (see above "The GOD File".) Concepts of the Tribler Lab have evolved for over 18 years.

We need to make new engineers in the team more aware of this: there is no clearly defined blueprint that shows the final structure of Tribler. We(/me) have failed to documented all evolutionary steps and lessons of the past. We need to collectively learn, but we dont have any formal defined support process for this collective intelligence. Therefore our key knowledge exchange happens at the :coffee: making :robot:; next coffee-machine meetup a volunteer will be appointed to make meetup-minutes/s. :writing_hand:

synctext commented 2 years ago

DAO engineering with "one-size-fits-all" model is wrong.

Policies or approaches written in immutable code and not tailored to individual needs is probably wrong. Instead of a "one DAO to rule the investment world" approach we need a collection of narrow-purpose DOAs into a composable architecture. Each DAO is a fully autonomous system with a stable API, a dedicated purpose, careful with breaking changes, and conservative governance model. Governance problems are greatly reduces when the purpose of a DAO is stable, the interface is stable and only maintenance-mode decisions are required (still risks of repeating the "Bitcoin civil war"). {Credits: brainstormed "swarms-of-DAOs" on 14 of March 2022 in Amsterdam.}

Permissionless innovation within a zero-trust DAO stack therefore follows the UNIX philosophy (like above cloud). It states that everybody should get along with others. Be an efficient specialist, not ineffective at everything. Functional decomposition of a composible DAO architecture yields: identity, trust, data, money, markets, and AI. We aim to build a DAO for all of them. Over-engineering warning :stop_sign:. Lets first make a single DAO work, deeply integrated with a single application: Tribler. When that is successful we can continue our engineering dreams of tech utopia. We dont see much freeriding, sybil attack or pollution. Lets volunteer somebody to build the first circular Bitcoin economy (inspired by our Robotic Music Industry). Earn Bitcoins by offering data storage, earn Bitcoin by offering encrypted proxy services, automatically invest in a VPS and get priority downloads by spending Bitcoin. Next audacious step: Bitcoin 2. The Bitcoin 2 DAO boosts transaction rate to 1 Mtps by adding dynamic deterministic periodic settlement on Bitcoin 1.

devos50 commented 1 year ago

Now that I left the Tribler lab, I will below list some of my insights, suggestions and take-away messages I obtained from the time I worked in the lab. Note that I left quite a few research ideas in our private GitHub repository to assign to further BSc/MSc or PhD students. Therefore, the points below are a bit more high-level.

Tribler

Tribler Dev Process

TrustChain and the Token Economy

Content Organization in Tribler (Tags/Knowledge Graphs)

The Bumpy Road to ML Deployment in Tribler

Advice for MSc/PhD Students

synctext commented 1 year ago

Lesson: Focus on your one true core? After 18 years and 1 month of Tribler we are still making the search&download core production-ready, stable, efficient, and fast. IPFS attempted from 2019 onwards to make 2 clients: The reference implementations of the IPFS Protocol (Go & JS) become Production Ready. Within the 2023 IPFS ecosystem there are 17 implementations, various libraries, and multiple networks they feel required to define more clearly what IPFS is. For Tribler, the one true reference core implementation is the specification. Different choices.

synctext commented 1 year ago

Lesson: stability matters After 18 years and 5 months of Tribler development, we still don't have our core stable. With our recent 7.13 release we focused on stability and the core features of search & download. We have now 58 bugs reported by our volunteers. These 58 unique bugs are registered inside Sentry through our automated bug reporting pipeline with detailed debugging info, duplicate bundeling, and automatic private-info stripping.
The connecting to core took April to August 2023 to understand and hopefully fix. Now we still have issues with the GUI-Core connection. Maybe the root-cause-of-failure is blocking of main thread of a few seconds by a process, unknown. Even though this is a lot of bug and especially nasty bugs, it is better than before. Lot of them seem to be the easy class of bugs, one developer can fix 5 of those per day. Two years ago we where in much worse shape. Lot of technical depth. Our sentry setting around 2021 was to hide and ignore all bugs which where reported by less than 10 volunteers.

Complexity is our enemy Stability, overengineering and complexity are our problems. COPIED from blog We, engineers, naturally react to hype. We get obsessed with the idea of learning something new and building complex, all-powerful solutions. No surprise, AutoGPT included vector db at the very beginning. BarterCast, Libswift, Dispersy, LevelDB, etc. But as the time goes by, good engineers focus on what’s really important. Hype is over, now that some value needs to be delivered to the actual users, complexity becomes our enemy. Tribler is known on The Internet :-) We are a bad example. It is a warning to add trust and to kill the lightness of old tit-for-tat. COPIED from YC forum I’d be wary of creating something that looks a bit like Tribler, which while an interesting project seems to have demonstrated that implementing trust, reputation and privacy at the protocol level carries too much overhead to be a compelling alternative to plain old BitTorrent, for all its imperfections

synctext commented 1 year ago

Lesson: emergence requires decentralisation and crowdsourcing requires micro-contributions Social process of crowdsourcing failed (e.g. Gigachannels). The ambitious roadmap for perfect metadata and enrichment for a Big Tech alternative with markdown support and merger of Wikipedia, Scholar.google, and Youtube/Tiktok. The unit of contribution was too big. This required 20 people to create 80% of the content: too centralised. Not permissionless. True decentralisation and emergence requires micro-contributions. Background reading: performance of channel volunteers is superlinear, leading to superlinear returns, winner-takes-most, and centralisation

synctext commented 1 year ago

Bug hunting speed

Lesson: keep track of bug inventory Paused improving tags, switched to bug hunting mode till remainder of 2023. Speed of closing bug was last week: 3 bug for 1 full-time week for 1 developer. How much time are we spending on getting experimental features stable versus fixing bug in Tribler general? Stop working on feature and first do boring or painful fixing chores? Software contains many defects. This is especially the case with fresh experimental university software: Tribler. Our world-first decentralised trust, decentralised AI, self-organising overlay, etc. comes with numerous bug. Currently 199 unresolved bugs in sentry, 103 marked as unresolved in current 7.13 release. One particular nasty bug still not taken seriously for a few years. Huge torrents don't work in Tribler. Speculation is that we might have had this bug for 15 years, 11 months and 14 days (the dark old days of 11 threads, would require extensive useless checking). Bugs can be nasty to fix. Pull request tries to fix 1 of 2 underlying causes we believe might trigger the race condition around CoreConnectionError: "The connection to the Tribler Core was lost" This possible fix is only 253 extra lines of code. {Too} complex fix. Still uses database to store which Tribler is primary, to avoid starting Tribler twice. Fixed multiple bug which crashed the core. Crashing cores has been taken us 1+ year to fix.

3 bug fixed in 1 full-time week for 1 developer

synctext commented 10 months ago

We're stuck, see thread!

synctext commented 4 months ago

anti-competitive collectives for post-capitalism {brainstorm}

Key lessons from collectives are that they need a clear vision. This enables gathering of social capital and establishment of trust. Thus the above initial sketch of reasoning from "first principles". Next step is a roadmap for relentless monotonic growth. Cardinal milestone is defeating an entrenched business model. Show the world that cooperation can beat monopoly capitalism. Establish a de-facto non-profit monopoly based on openness, sharing, and kindness. Kinda challenging in todays toxic Internet :skull:.

drew2a commented 3 months ago

I wrote and rewrote this post many, many times. First, I made a long list of things that could be improved, then I spent a lot of time choosing the right phrases, and then I reread the current issue and removed the points that had already been mentioned in one way or another.

In the end, I decided to keep it brief, because all my points could be addressed with the same piece of advice: if we want to transform Tribler from a scientific prototype into a product, we need to hire a manager. This is the only and main advice I will leave. Self-organization without a manager or without a clear common goal leads to what in Russian fables is described as "the swan, the pike, and the crawfish" https://allpoetry.com/Swan,-Pike-And-Crawfish.

I will also leave the thought that sole ownership of a repository, feature, or project is a form of centralization.

synctext commented 2 months ago

Keep it simple in space: https://www.reddit.com/r/ASTSpaceMobile/comments/p0m1yo/the_popup_array_unfolded_analyzing_an_ast_space/ Huge unfolding satellite with complexity on earth phase.