Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.8k stars 444 forks source link

phd placeholder: Accountability Protocol for Future Decentralized Applications #4719

Open grimadas opened 5 years ago

grimadas commented 5 years ago

Working title: Project Noodles

A draft for layers for the accountability tools for the next generation applications. This is semi-layered architecture draft for better understanding of the concept:

NoodleProtocol

grimadas commented 5 years ago

You can also see some double spend experiment with a network knowledge for the Trustchain and Tribler: Issue #4634 It works really good in practice. Now we need to generalize it and proof it works for other networks.

grimadas commented 5 years ago

Draft for architectural diagram: Description: TBA

NoodlesComponents(1)

devos50 commented 5 years ago

I get the overall idea (which is nice!). I might have some comments to increase the presentation and clarity of your diagram:

qstokkink commented 5 years ago

First out of 20 blocks done:

afbeelding network_plot.pdf

afbeelding distribution.pdf

cuileri commented 5 years ago

@qstokkink's work is basically the "Peer Discovery" module in @grimadas's architectural diagram above. With this module, we want to be sure that peers get to know as many as possible number of peers. We still need to make tests with larger graphs.

Next sprint will be on the next component of network layer, which we name as "fraud resilient overlay." Basically, this component will enable the nodes to build their 1-hop neighborhoods in the overlay network. The idea is to choose neighbors as diverse as possible with respect to the latency measurements, with the aim of decreasing the probability of a sybil to appear in one's neighborhood (see details here).

The graph below is a simple representation of how much we would decrease this probability. I used this latency dataset with 490 nodes and assumed that one of the peers creates 50 sybils around itself. On the x-axis are the nodes. On y-axis, we see the probability of the respective node to have a sybil as its first neighbor. Blue line indicate the random selection of nodes, orange lines show our strategy.

fig

fig.pdf

Next sprint: Building overlay network

qstokkink commented 5 years ago

First results of the overlay network, without diversity function F, for 64 nodes. The ping time is artificially delayed by the absolute distance in node ids. After +-70 seconds you know most of the network. After +-150 seconds all 64 peers have at least 14 diverse peers and have matched with at least 5 peers.

Initial findings:

The following graphs show the peers (amount of open connections), ancestors (amount of peers included in diverse introduction chains) and partners (the amount of matching peers using the greedy b-matching). afbeelding

qstokkink commented 5 years ago

After scaling up to 300 nodes, we get more partners using the b-matching protocol. Further finding:

afbeelding

qstokkink commented 5 years ago

This is with the same settings as the last post. Now with F function implemented (we use kernel density estimation to approach a given distribution of ping delays - 1/x in this example).

afbeelding

qstokkink commented 5 years ago

By periodically removing peers that have not been included in our latency edges, we get a more stable set of peers to select for matching (and a higher number for that matter).

Without cleanup: afbeelding

With cleanup: afbeelding

As our matching algorithm is stable, peers do not get dropped when a more favorable peer is available. Especially in smaller networks, this leads to some peers having very few partners: afbeelding

qstokkink commented 5 years ago

By properly pruning the matching partners we get better results from the stable b-matching algorithm. Most of the nasty side effects (in terms of messaging) of periodic cleanup seem to be patched up by having a larger number of maximum matches.

Up to 30 matching partners: afbeelding

Up to 60 matching partners: afbeelding

These results are for a network of 300 peers, where every peer has 180 open connections (of which roughly 120 are considered to have a unique latency).

grimadas commented 5 years ago

Noodle Protocol and Related Work:

We are building an optimistic P2P for enabling decentralised applications that values scalability and decentralisation the most. Being optimistic we need to handle occasional frauds in the system. To handle fraud we need reliable or at least predictable overlay. If malicious node cannot hide certain transaction we can detect fraud.
We want to build an overlay that will work in highly dynamic environment, for that we need to have following main modules:

Nice insides from the databases:

Membership protocol

Many systems assume full membership and/or full network knowledge. This is not realistic in a dynamic network. We opt for a partial view of the network. Such partial view must be not biased and work under churn.

One of the most interesting approaches is based on gossip:

Ofc, these protocols will not work if you have unlimited Sybils that can poison your local view and isolate you from the network. A combination of Sybil mitigation strategy and Brahms is definitely a good idea.

Partnership selection

Membership protocol is typically used to get connected and make it hard to isolate a node or partition the network. But they never argue how to select partner for long-term communication that would be optimal for you.

We can also select peers based on the capacity(declared bandwidth, or declared collateral balance for being a money router). We might then optimise the delivery of messages to a specific peer and build the system around that peer. Let's look at the concept of being interested in a peer as a topic in a PubSub System. Periodically peer declares a interest in peers as a vector of topics(or vector of trust). The overlay network organises itself into a pubsub system that gives guarantees on convergence and risk. For example, if you see that peer has to many subscribers with a low capacity you might prefer some other peer.

For inspiration, you might look into:

Some of the gossip protocols require a verifiable neighbourhood, that is either build based on hashes or is always limited. One of the main downsides of gossip is freeriding, which in our case is Byzantine behaviour (peer not forwarding information to hide certain transactions). More on that in accountability section

Accountability of partners

If we say that information forwarding is crucial we might want to track if everybody is following the protocol and incentives good behaviour.

One of the earliest practical proposals is

Fraud resolution

-Making P2P Accountable without Losing Privacy

We say that resolution policy is application specific and we aim for deterministic resolution in P2P. There are three ways to resolve disputes/frauds. For simplicity we look for resolving two conflict:

  1. Include all. Peer made two conflicting transaction, like spending same tokens and trying to hide it, but some applications allow to go overdraft(credit system, Tribler), so a trivial solution to include all blocks to calculate the final balance work fine.
  2. Both are invalid. You made a fraud, everybody will eventually cancel both transactions. Eventually is guaranteed through gossip and robust membership. But some peer might repeat the fraud, so some punishment must be consider too.
  3. Choose based on weight. This is the hardest one. Each transaction is assigned with a weight, being it work based, stake based or any other rules. We say that each peer will converge to same block eventually (Related: Gossip based agreement).

Sybil attacks and latency

We don't need to solve distinct identities and one man one identity, a group distinctness is enough:

The problem with these approaches are beacons that are central point of the system. You can make them fault tolerant but it is still a question how to choose them.


Another line of work that can help to detect sybils is beacons and is based on virtual coordinates. You try to map with an error Internet delays into a Euclidean space.

Most famous one is:

Another interesting papers is based on rings, which are similar to our buckets: Meridian: a lightweight network location service without virtual coordinates, 2005

In short this is an area with established related work, we should be careful not to miss them.


qstokkink commented 5 years ago

We can also select peers based on the capacity(declared bandwidth, or declared collateral balance for being a money router).

There should be no need to depend on declared bandwidth, as we partner based on latency. The less bandwidth you have available, the larger the RTT will be as you transact more in the system. This has already been widely researched in the congestion control sphere. We can look into something like BBR: Congestion-Based Congestion Control.

grimadas commented 5 years ago

We can also select peers based on the capacity(declared bandwidth, or declared collateral balance for being a money router).

There should be no need to depend on declared bandwidth, as we partner based on latency. The less bandwidth you have available, the larger the RTT will be as you transact more in the system. This has already been widely researched in the congestion control sphere. We can look into something like BBR: Congestion-Based Congestion Control.

I think we should distinguish at least three types of peers:

  1. The peers we choose as a communication mediators. You will use them for forwarding information mainly. They are important to be connected(protection from partitioning, see Membership protocol) and we want to verify if they are sybils or not.
  2. Peers that we want to verify if they have a certain balance, because we want to receive a payment from them in exchange for a service from us.
  3. Peers that provide a certain service to us and we will pay them. They want to have verify me.

Even in case of Tribler the peers are not homogenous, there are different roles(exit node, relay node, etc.). I think that fundamentally changes how you build overlay.

grimadas commented 5 years ago

There is a bunch of related work on accountability and reliable and scalable gossip/multicast. One of the key components to that is the Verifiable Neighborhood, which assigns a random members for interaction and verification. For example, in PeerReview witnesses and auditors are based on consistent hashing. In DHT your neighbors are chosen by the proximity of their hash.
Such systems can tolerant churn and malicious users, but there is always an upper-bound to it. This bound can be trivially broken with a Sybil attack, and further your local view will be poisoned which breaks all security/risk assumptions.

We are now working on a Sybil Resilient Verifiable Neighborhood.

Everybody follows the diversity protocol, which together with random sampling(like Brahms) gives some protection from Eclipse Attacks and network partitioning in a presence of Sybils in static environment. This is enough to use gossip information dissemination and discover peers that we want to interact with, but this is not enough to give risk estimation.

To calculate the risk of getting a invalid token(double-spent) or the the risk of not getting payed(outdated balance estimation) and finally time to detection(t-visibility) we need to know current and last interaction partners. Interaction partner is a peer that you want to be accountable, you want to validate their balance, calculate a function based on the history of the peer etc.

In other words, we want two answer three questions:

  1. Do all partners know each other? (Agreement among partners)
  2. Do I know all your current partners? (Staleness)
  3. Are your partners good enough? (Sybilness)

To answer all three question and make a practical system is hard task, if even possible. For us it is good enough to answer all of them probabilistically, e.g. instead of hard staleness we can go with eventually complete knowledge(k-staleness), but in this case it is very important to bound the dynamics of the partner set, i.e. number of peers you can add/remove at certain time.

Suppose we have a function: Distinct({P_1...P_n}, k) --> {0, 1} It checks if the set of peers {P_1 .. P_n} contains at least k distinct identities, where k <= n and outputs 0 if there are less than k, 1 if there are at least k. This is function can be implemented using several approaches:

To answer these questions we have two directions:

Direct agreement among partners

It gives clear bounds and determinism on which we can calculate further transaction risks. It is only used when there is update on the partner set. For the stable partnership with will be used only once. This approach has established research and would be easy to argue about. It is not clear how it would work in a dynamic setting, more testing and discussion is needed.

Virtual agreement among partners

This is more elegant approach that reuses Trustchain ideas, but it is not clear how quickly will converge to a stable set and how to handle forks. Futher it requires synchronization of the chain and redundant interactions between peers.

qstokkink commented 5 years ago

I think we should distinguish at least three types of peers:

1. The peers we choose as a communication mediators. You will use them for forwarding information mainly. They are important to be connected(protection from partitioning, see Membership protocol) and we want to verify if they are sybils or not.

2. Peers that we want to verify if they have a certain balance, because we want to receive a payment from them in exchange for a service from us.

3. Peers that provide a certain service to us and we will pay them. They want to have verify me.

Even in case of Tribler the peers are not homogenous, there are different roles(exit node, relay node, etc.). I think that fundamentally changes how you build overlay.

I agree. What I am arguing is that there is no need to advertise specifically bandwidth. By partnering based on latency, we already have the emergent effect of partnering with peers that have sufficient bandwidth.

Regarding the partering agreements:

This set and signatures from the peers is enough to prove to anyone that you have a certain peer set.

I would reword this, to say that you can prove to anyone that you have at least a certain peer set.

Furthermore, it seems like the resource class of Distinct function is very expensive and measuring most of the mentioned solutions would also interfere with the Distinct functions of others.

Please also elaborate on how having distinct identities provides sybil-proofness.

grimadas commented 5 years ago

This set and signatures from the peers is enough to prove to anyone that you have a certain peer set.

I would reword this, to say that you can prove to anyone that you have at least a certain peer set.

We need a lock on the interaction partners, a certain commitment that peer will interact only with them for certain time/rounds. That will be used to calculate risks, no interaction is possible before providing this commitment. Every peer should eventually know about your commitment, so we can use gossip for that. This will also have guarantees on detection if done properly.

Furthermore, it seems like the resource class of Distinct function is very expensive and measuring most of the mentioned solutions would also interfere with the Distinct functions of others.

Please also elaborate on how having distinct identities provides sybil-proofness.

It is an attempt to generalize the Sybil attack in our context. In case of the consensus there is direct way to calculate impact: if you can break 51 % with sybils you are the king.

In our case one of the threats is Eclipse attacking by sybils. Suppose after sampling you get a set of peers, now you want to test if this set is unique enough. You require to estimate that at least k of n are unique. For example, 2 of n is enough to not be eclipse attacked, but probably not enough to build healthy overlay? For that you can test latency and diversity, or require do resource testing(computational challenge, storage challenge etc., bandwidth exchange). We should argue why we choose latency and diversity instead of resource testing. Both approaches are only locally verifiable btw, making it globally verifiable is not trivial.

qstokkink commented 5 years ago

We need a lock on the interaction partners, a certain commitment that peer will interact only with them for certain time/rounds.

This is a crucial bit of information that was lacking in the original explanation. I can see that working, agreed.

test if this set is unique enough

I think we need a stricter definition of what a sybil identity is then. It is pointless to argue which measure we use for sybil identities, if we do not have this defined from the start. In the case of Bitcoin we can argue that its proof-of-work is very bad at avoiding sybils, as the rational actors are forming mining pools and have to artificially be kept under the magical 51% mark. The only thing protecting against one of these attacks in Bitcoin is the fact that the 51% attack would also devalue the coin itself. Back to the definition part: where do we draw the sybil line in Bitcoin? Is a Bitcoin sybil a single machine with a lot of CPU and multiple addresses? Is a Bitcoin sybil a factory hall filled with asics? Is a Bitcoin sybil the entire mining pool?

Both approaches are only locally verifiable btw, making it globally verifiable is not trivial.

I will challenge the fact that it is even possible to do global verification of sybils without a single trusted party in control of all identities.

grimadas commented 5 years ago

I think we need a stricter definition of what a sybil identity is then. It is pointless to argue which measure we use for sybil identities, if we do not have this defined from the start. In the case of Bitcoin we can argue that its proof-of-work is very bad at avoiding sybils, as the rational actors are forming mining pools and have to artificially be kept under the magical 51% mark. The only thing protecting against one of these attacks in Bitcoin is the fact that the 51% attack would also devalue the coin itself. Back to the definition part: where do we draw the sybil line in Bitcoin? Is a Bitcoin sybil a single machine with a lot of CPU and multiple addresses? Is a Bitcoin sybil a factory hall filled with asics? Is a Bitcoin sybil the entire mining pool?

Totally agree with you that it is tricky with mining pools. What you essentially have you have multiple users colluding to contribute to one address, should they be considered Sybils or just colluding?

But let's get back to our case. Our goal is to pick overlay neighbours that are good enough and after pick interaction partners that are good enough. Goodness on the overlay is the based on forwarding information and not eclipse attacking you. Goodness on interaction partner is based on ??? Capacity? Balance? Trust? Reputation? Depending on what we choose the Sybil attack will change too

qstokkink commented 5 years ago

Goodness on interaction partner is based on ??? Capacity? Balance? Trust? Reputation? Depending on what we choose the Sybil attack will change too

Perhaps we can generalize this as the chance of your interaction partner finding any particular transaction you were involved in. Balance, trust and reputation are all derived from an (almost) complete set of transactions.

grimadas commented 4 years ago

For the next iteration we will focus on hardening the guarantees, giving a quantitative bounds on finality.

Goals

The main goal is build an accounting system which is simple yet powerful to support decentralised applications on top of it, such as financial markets, reputation systems etc.

Non functional requirements:

  1. System working for millions of clients.
  2. Has high availability: no single point of failure
  3. Attack-resilient: adaptive to malicious peers.
  4. Provide 'just enough' building blocks for applications, such as information convergence (consensus).
  5. Simplicity

Means:

  1. Bottom-up design: No total order, use local view as much as possible.
  2. Replication with redundancy: there are enough connected active peers.
  3. Limit the attack vector by limiting the world.

Science behind it:

Optimizations: TBA

synctext commented 4 years ago

Noodle == much more then double-spending and witness lists. That was new to me. 2 years of work have been invested in the channels (or main outcome of 2 years of re-factoring). Switching to The Noodle Stack (with 5 layers it's more then a protocol) is quite disruptive and took me by surprise as you obviously noticed (more discussion on 10June).

Set Reconciliation is a step towards the complexity of Dispersy with Bloom filter, something we which to avoid. For a lot of use-cases (e.g. channels) you can assume there are sufficient "seeders" with 100% of the information and you can simply collect it. Everybody is required to "seed" their complete trustchain record history. Simple linear list. Can you support that easily?

Channels only need linear downloads of bulk Magnet links plus filename(s). Therefore, for channels a single append-only linear list with logical deletes is sufficient. Current version requires a channel to be completely downloaded before records can be inserted. Partial downloads of subdirectories of interest are not supported. If we enhance the channel format to be downloaded in real-time, inserted and seeded partial we solved sufficient problems and can move on. For the roadmap and Kotlin implementation we need first documentation of the current channels format and commit structure.

Questions to discuss with a few coffees; Do you envision The Noodle Stack support the bulk exchange of 1+ million magnet links with congestion control? How does this work relate to #3690, trustworthy gossip? How do you solve the problem that most witnesses will be offline 98% of the time (unfiltered)? Witness reliability filtering is probably a requirement, but also an attack vector?

grimadas commented 4 years ago

Set Reconciliation is a step towards the complexity of Dispersy with Bloom filter, something we which to avoid. For a lot of use-cases (e.g. channels) you can assume there are sufficient "seeders" with 100% of the information and you can simply collect it. Everybody is required to "seed" their complete trustchain record history. Simple linear list. Can you support that easily?

I agree. I kept this part as simple as possible. If you have an order the reconciliation is much easier and Merkle Trees and Bloom Filters can be avoided. However, linear order specially a global one is not always possible or even desirable, for these cases we use reconciliation on a DAG. With noodle we enforce linear order on one writer-chain (personal chains - classical Trustchain). Community chains (multiple writers single datapoint) are optimistically linking to converge towards a linear log, but nothing bad will happed if it is a DAG.

Questions to discuss with a few coffees; Do you envision The Noodle Stack support the bulk exchange of 1+ million magnet links with congestion control?

That would be definitely interesting to try.

How does this work relate to #3690, trustworthy gossip?

Gossip is the heart and soul of Noodle. We just narrow down the gossip to only those who care (related to PubSub). In current implementation I use push-pull gossip with reconciliation to guarantee fast convergence. Partly tested with Wikipedia dataset, can converge within 15-20 seconds under heavy load for 1000 peers.

How do you solve the problem that most witnesses will be offline 98% of the time (unfiltered)? Witness reliability filtering is probably a requirement, but also an attack vector?

Witnesses are not required for the protocol to work (Guaranteed Eventual Detection). They are required to build critical applications where detection (+ grim trigger) might not be enough. Witnesses can be chosen as you like, but the best approach so far - choose peers in the community and make sure they are diverse enough. Witness audit is enforced by a counter-party and their risk assessment, so in the end, they will decide to accept/decline the transaction.

Witness availability is not a big issue since the witness set can be expanded up to everyone in the community. However, every community need to make sure that there is at least some number of available peers in the community. But this a research topic on its own, which I'll address later.

grimadas commented 4 years ago

Iteration progress and plans for the next sprint:

Repository

Noodle is moving to separate repository to easier management. For now it under it is under my personal GitHub, I suggest to move it later to Tribler repo, and include it as a reference in main Tribler repo. Noodle builds on top of IPv8, with IPv8 as a requirement.

Development plans

Noodle will be continuously tested and improved. The protocol will be documented and published as a pypi package.

If successful Noodle will join the pleiad of similar protocols: DAT, IPFS, Scuttlebutt.

Short term experiments and questions (1-2 Months):

Medium-Long term research plans:

  1. Battle-test 3 applications: DAO governance (e.g. collaborative editing Wikipedia), Trustworthy content meta-data exchange (for file-sharing), ???
  2. BitTorrent is great tool, but it has narrow applicability, i.e. BitTorrent governance is based on Tit-for-Tat and is restricted to very small world - one swarm. As a consequence it restricted in the number of applications and is driven by the local optima. I would like to explore alternative mechanisms of governing common infrastructure. Using decentralised orchestration we can build self-organising dynamic community infrastructure that is driven first by the community good and (reputation, risk and trust) to avoid tragedy of commons (for example loss of a file, over-abuse of certain peers) rather than tit for tat.
synctext commented 4 years ago

Just for my understanding I'm translating the above into more abstract terms and MvP requirements for an accounting system which can underpin our live token economy.

BitTorrent is great tool, but it has narrow applicability, i.e. BitTorrent governance is based on Tit-for-Tat and is restricted to very small world - one swarm.

Yes, using Bittorrent for bulk downloading of gossip information seems like a hack. It actually is a hack. :warning: rant warning... with trustchain we abuse Bittorrent as a multi-sourced FTP service. In the history of our lab (since 1999) many master students, developers, post-docs or phd student believed that they could do better then Bittorrent or improve it. Everybody failed after trying for upto 4 years. Their designs are more elegant and superior. Their implementations never matched. They always added more complexity then a one-person project can handle. Serious people stopped trying to redo the Linux kernel or the leading compiler. IPFS and DAT people are finding out the hard way, Bittorrent is hard to improve upon. It can even support Tor-like privacy, if you beat it up for 5+ years. Engineers are trained to hate such hacks. Lesson: after our costly https://libswift.org experience, do not try to be superior to Arvid his brilliant code. I tried and it has cost me 650.000 Euro of EU money.

Please read about this gossip work done by our master student who was hired by Mozilla afterwards.. "Splash: Data synchronization in unmanaged, untrusted peer-to-peer networks". Do you want to make it as simple as such "sync points"? What do you think of their idea of using random gossip for "live mode" and Bittorrent for "history mode"?

grimadas commented 4 years ago

Well I agree that is really hard to match Bittorrent can be used as multi-sourced FTP service, but BiTorrent is much more than that, than it raises a question why not use something different.

But it as it was designed as for bulk downloads the only way to one when use Gossip vs BitTorrent is do heavy experimenting. My hypothesis gossip sync should be sufficient for most of the cases. As clients generally don't want or even don't need to download big bulk data, we might consider something more lightweight. I hope experiments will reveal that.

synctext commented 4 years ago

Tragedy of the Commons considered solved

On top of the trust layer there is perhaps the most interesting part of the application layer: the allocation of resources, preferential treatment, punishment of freeriders, community exclusion of cheaters. Combined they could be a new layer in your model: management of the commons.

Wikipedia, Linux kernel, Bitcoin, Bittorrent, and residential WiFi sharing somewhat manages to manage the commons, with limitations or with remaining central server. Scientific target: relentless automation and zero-server principle. Organisational principles:

  1. fully autonomous participants
  2. freely shared data
  3. personalised risk calculations
  4. stable collaborators and preferential treatment
  5. Growth of social standing and leader-board rank

Relaying requires incentives and seeding requires storage and bandwidth. (open issue for 7 years to donate 1 TByte to the community #21). Open issues lossy accounting, overhead of accounting, risk exposure, double spending risk, settlement time, economics of anonymity.

proposed publication plan: always have 1 paper under review. Re-use OSDI/Usenix thinking. Design paper with early results of your accounting framework. Resource contention, 200k new trustchain records daily, exit node rules, preferential rules, scheduling algorithm. Paper by 21 September ?.. :clock10:

2022 insight: Do not trust strangers, only help them a bit to get going. SybilGuard and all these algorithms fail. Maintain stable collaborators in order to protect from unlimited identity creation. Critical failure to assume humans can detect fake identities from real humans. See 300 fake newspapers example.

synctext commented 4 years ago

Privacy with gossip: Spy vs. Spy: Rumor Source Obfuscation image Real attacker do a Sybil attack (but still interesting related work).

synctext commented 4 years ago

@grimadas Novel idea: take Trustchain entanglement to the next level: link to any party and fixate their last block. Aug 1st goal of running code on DAS. Then solve Tribler problem of freeriding. You can be in 100 or 1000-ish counter-party groups which synchronise state. Towards the goal of knowing the contribution level to the commons of 1 million peers. Mission is to prevent the worst abuse and fraud (not exact accounting). Please document your design in 1 page by 14 July. Plus: https://dicg2020.github.io/

synctext commented 4 years ago

Reinventing The Internet for the Common Good

Within this article we present a scientific roadmap for addressing monopoly formation on The Internet and our economy at large by creating an infrastructure for the common good. We present results of our ecosystem we build over 15 years, inhabited by 1.7 million volunteers. Our work encompasses four isolated scientific fields. With our infrastructure for the commons good lens we unify the fields of applied social science, distributed algorithmic mechanism design, adversarial information retrieval, and traditional distributed systems. Our architecture contains five elements:

  1. Our fundamental building block is awareness of outcomes when two economic actors meet. By spreading awareness on defection or pro-social cooperative behaviour we build upon solid foundations of game theory and prevent the price of anarchy. This is our scalable distributed tamper-proof accounting layer. Dissemination is conducted using gossip
  2. Fraud filtering is the next element within our architecture. Based on historical encounters and transactions we must filter the lies, falsehoods, and fabricated accusations. This bears many elements from adversarial information retrieval. See the work by TUDelft faculty: TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings image
  3. Resource allocation is another key element, based on fraud-filtered historical behaviour. Only simple rules are used based on local available information which result in emergent increased social welfare. Complex systems are impossible to schedule centrally. Successful systems evolve further and complexity grows with enhanced sophistication and increasing popularity.
  4. Governance is the critical long-term element. Who is in charge of the ecosystem and organises change? We focus on the special class of self-governance, where both nobody and everybody is in control. Collective decision making is based on democratic principles.
  5. Application-specific logic is embodied in this final element. Bookkeeping, fraud prevention, and resource allocation are re-usable across multiple domains.

We claim our generic architecture is usable across business domains and a diverse set of applications. We focus the remainder of this work on the core ledger part. Our novel addition to existing body of work within ledger science is based on the strength of weak ties. We are the first to successfully apply the classical work by Granovetter from 1973 to dramatically improve the effectiveness of ledgers. We only add a single rule to a DAG-based ledger. Each economic actor randomly fixates the last observed state of another economic actor on its own ledger. The state of others is compressed into a single hash, resulting in a mere 32 bytes of storage cost. A special property emerges from adding random pointers in a graph with unknown properties: entanglement. Economic actors collectively ensnare each other.

We provide proof for the unreasonable effectiveness of weak ties for ledger science.

llegard commented 4 years ago

Would it be a stretch too far to talk about flexible networks without rigid boundaries, instead of ‘weak ties’? Could be a unifying argument about the power of permissionless systems.

grimadas commented 4 years ago

A paper under work now, working title: "Universal Accounting of Goodness".

Why accounting? Insights from accounting theory

The main reason for accounting and unified rules is to address uncertainty and information asymmetry. Information reliability and relevance is important for the business decision making.

We translate these properties to distributed systems, namely - consistency and validity of information. We recognise the fundamental difference between these too and address the most important adversity: information hiding and information tampering

But we also recognize that achieving 100 % reliability is not always possible nor desirable. As information retrieval and storage is conjugated with spending resources. We model accounting platform that is flexible and universal to support accounting for multiple domains. It allows users to choose different policies based on they risk model and adjust their behaviour accordingly.

synctext commented 4 years ago

@grimadas Can you take 15min today and post an progress update of the past summer months + all code.

A paper under work now, working title: "Universal Accounting of Goodness". Why accounting? Insights from accounting theory The main reason for accounting and unified rules is to address uncertainty and information asymmetry.

The leading organisational principle of developed nations is the market mechanism with a certain level of government regulation. Open access (commons) is an alternative to private control, however it is academically underdeveloped in terms of theoretical support. The tragedy of the commons is the key obstacle to sustainable commons. We claim that private control by Big Tech companies is difficult to avoid, without a solution to the digital commons problem.

We introduce the first viable solution to digital commons problem, based on accounting of social capital. To govern the common collectively we use accounting and unified rules to address uncertainty and information asymmetry. We provide empirical data on our global database of social capital.

:drum: Intro Bami! (please push the tentative code)

grimadas commented 4 years ago

Requirements for sustainable cooperation

1.Maintenance of cooperation:

grimadas commented 4 years ago

Main application focus of BAMI is contribution accounting to sustain cooperation. The requirements were given above. With main one if the identification of bad peers (Byzantine cheaters and freeloaders).

I see several different ways to approach this application. Here are my thoughts:

What is contribution?

Contribution - is a pairwise action from one peer to another that will be recorded as a transaction with two actions: contribution T1: peer A -> peer B and confirmation (or explicit rejection) T2: peer B confirms (rejects). Every contribution requires two signatures (pairwise accounting), this approach addresses misreporting, i.e. peer slander, peer A cannot falsely claim that it contributed to peer B with B's consent.

Contribution event - double-signed entry.

Can we do it differently?

We don't verify the event truthfulness.

Can we always identify freeloaders?

As it is connected with verification and having global knowledge - more no than yes. Only in specific applications under certain assumptions (honest, altruistic majority, global consensus). The best we can dot to stay scalable with millions peers is to optimistically detect/estimate probability of freeloading.

We don't prevent freeloaders with 100 accuracy.

How can trust algorithms help?

They can help in two ways: Ranking or Filtering. Depending on the application requirement one might be preferred over the other.

This allows us to achieve magical properties, but only if we view it from the perspective of an honest node:

Best candidates for these algorithms for now are: variations of personalised hitting time.

We use trust algorithms to limit the effects of attacks

Is that global trust algorithm?

No. This is highly personalised, as the only trust anchor that peer has are 1-hop interactions. Peer can either verify all 1-hop peers directly, or it trusts them. (This is our assumption!).

This leads to an interesting result - some peers might consider honest peer a freeloader, and vice versa. However we show that with enough interactions work graph and trust algorithms lead to evolutionary stable state - freeloaders starving.

Trust function is personalised. Global ranking or filtering emerges with time.

Where trust algorithms will not help?

Peers can still hide information and cheat by presenting only partial information to bias the trust algorithms. This attack is related to forking, network partitioning.

To protect from such attack peers need to exchange data on the target peer. This can be achieved via gossip, periodical checkpointing and witnesses.

To simplify and decrease time to detection we connect together peers by topic (interest in the target peer). This turns blind gossip - into a targeted one (similar to pub-sub).

We use targeted gossip to minimise time to detection

What do we do with freeloaders?

As the trust is personal honest peers might see each other temporarily as suspicions. However I believe that we should block them, but instead give a limited service depending on the severity. For example, give less priority, or decrease speed. Thus, any freeloader would eventually starve.

Punishment for freeloaders - service quality degradation.

What do we do with cheaters?

Cheating is more severe - as it is a change in the protocol with an objective proof of cheating (justified). Either more degradation of the service, or temporary ban might be considered.

Cheaters get more severe punishment but only after justification.

Is there a way back?

There should be always an option to get back that is more preferred than whitewashing. For example, cheaters might fix all the forks and continue. Freeloaders will get the same quality once the ranking is increased.

Punishment are incentives for change.

Unaddressed issues and research questions.

  1. Two counterparts might change their mind later. For example, imagine total bandwidth contribution a time t1: A -> B: 100 MB, B -> A: 0 MB. Later counterparts decided to change their contribution relationship (total contributed): t2: A -> B: 50 MB, B -> A: 0 MB. The total went down, do we consider it as valid behaviour or not? Should it punished?
  2. All trust algorithms I encountered so far assume global knowledge of the work graph. This not realistic, how partial knowledge affect the error in the trust algorithm? We need experiment to show that.
  3. As we entered the realm of imperfect accounting, a natural question arises: What is the minimal information I need to keep for the system to work?
  4. How do we deal with multiple services provided by the same peer? For example, one peer might run bandwidth-as-a-service, or cpu-as-a-service or storage-as-a-service.
    • Should the chain and gossip be agent-centric? (we gossip about peer-chain peer A for every service/app he is in), or app-centric(gossip about app-chain: shared chain of peers that run specific service/app), hybrid (chain of the peer in specific app).

The comments and more question are more than welcome!

synctext commented 4 years ago

We don't verify the event truthfulness.

Excellent thoughts! Please expand this into 6-pages IEEE before 21September.

Storyline: "Facilitating the emergence of truthful accounting". Our remarkably simple accounting mechanism does not record any details of interactions within a community, yet effectively suppresses freeriding, cheating and fraud. Our only mechanism is to irrefutably record the existence of an interaction between two community members. Paradoxically the mere recording of this interaction together with higher level mechanism lead to the emergence of global cooperation and starvation of freeriders. We successfully devised rules with the complexity of tit-for-tat with...

With one addition we can make our mechanism resilient against fake identities within a complete collaborative framework. New identities start with zero community standing and only receive service if they have made a net positive community contribution. Further details are outside the scope of this work:-)

synctext commented 4 years ago

The Truth Convergence Paradox. Mechanism to establish "single event truthfulness" do not converge to fraud-free communities as well as mechanisms which ignore truthfulness at this level and operate only using partial information, random gossip, and optimistic probabilistic detection.

synctext commented 4 years ago

Related work: trying to enforce single-event truthfulness at the cost of efficiency (destroys 300Mbps+ download capability of Libtorrent) "Treat-before-trick: Free-riding prevention for bittorrent-like peer-to-peer networks"

synctext commented 3 years ago

Bami Blocks have been born! Nice work :1st_place_medal:

Management concern: when the phd work is finished the outcome is either: 1) self-maintaining Github community 2) dead code in Tribler. Please only make a modest change to Trustchain, keep improving and expanding it as much as your like. But keep the first version of Bami integrated in Tribler clean of any non-Tribler features. Proposal: by 15 November there is a stable Tribler build on our Forum with Bami 1.0 ready. Time-boxed approach. List of potential issues to address (brainstorm):

Candidates for later improvement cycles brainstorm, after Bami 1.0 release:

grimadas commented 3 years ago

After hard thinking and planning, I decided to follow the most straightforward yet powerful approach there is:

Simplest accounting mechanism for Tribler:

In total peer will store only K block, where K is the number of counterparty peers. Even if you craw everybody you only store |E| blocks, where |E| is the number of interaction pairs between users (number of edges in an undirected graph).

Reputation mechanism

Latest experiments and research showed a surprising result - a reputation mechanism if mode properly to provide a substantial level of robustness, even in the presence of Sybils and fake data. Hence we refer all security and robustness to a reputation leaving accounting very very simple.

This reflect the vision and is sufficient to build sharing economy with minimal complexity.

synctext commented 3 years ago

wow! Thank you for your Tribler-centric/lab-centric thinking. So with barebones accounting, latency diversity, peer ranking, competitive upload spots we get spam flooding protection and Sybil-resilience? We need more?

grimadas commented 3 years ago

wow! Thank you for your Tribler-centric/lab-centric thinking. So with barebones accounting, latency diversity, peer ranking, competitive upload spots we get spam flooding protection and Sybil-resilience? We need more?

Yes, according to the latest research insights, it provides sufficiently solid security and allows us to build the first version of a micro-economy based on sharing 💪.

qstokkink commented 3 years ago

latency diversity

@grimadas Do you need this now? It's currently not part of IPv8's wild codebase (as the related research is not accepted anywhere yet). I can hook it in though.

devos50 commented 3 years ago

wow! Thank you for your Tribler-centric/lab-centric thinking. So with barebones accounting, latency diversity, peer ranking, competitive upload spots we get spam flooding protection and Sybil-resilience? We need more?

I don't think that we need the latency diversity for the first (barebones) version of the bandwidth ecosystem. I advise to consider this as a task for next iterations.

I will also update our ongoing write-up with these key insights.

qstokkink commented 3 years ago

Alright, I'll leave that out then. It would probably also only serve to hinder you anyway: the DAS-5 running 1000 instances per node would be detected as a Sybil attack 😄

synctext commented 3 years ago

The barebones accounting is now deployed successfully, dubbed LedgerZero. Please remain focused on an upload incentive. First make it work. Try to suppress the reflex of generalisation, into a generic incentive mechanism. Nobody has ever created an indirect reciprocity mechanism that actually worked and scales. Tragedy of the commons for 1 resource is unsolved. Leave moderation processes and metadata for the next generation of phd students in the lab. It is guaranteed to delay phd graduation if you try to improve upon this 175 page work. Actually, its even a broader problem I believe. Moderation is hard. We currently do not have the resources to facilitate spam-free and harassment-free communication amongst our users. Free text flow (many-to-many two-way) is too difficult to "Internet police", our scientific focus is first self-governance of 'filename moderation' (suppression of spam,fakes,worms,broken files, poor quality files, duplicates, misspellings, and bundles that are not bundled). image

synctext commented 3 years ago

As discussed. Was the focus on freeriding for 16 years smart? In a quality community, do we then induce unconditional giving? permaseeding is the term I would like to introduce for the desired behaviour. Permaseeding enhances social welfare, as opposed to hit-and-run.

Platforms engagement and sharing: https://www.emerald.com/insight/content/doi/10.1108/JSTP-04-2016-0071/full/pdf?title=engagement-platforms-in-the-sharing-economy-conceptual-foundations-and-research-directions

synctext commented 2 years ago

Freeriding is very unsolved and Web3 makes it very relevant. Accounting protocol is making progress. Perhaps we can get 2 chapters out of it. One: algorithm only; repeated prisoner dilemma N-person; solved with gossip. Other deploy and measure with LedgerZero.

synctext commented 1 year ago

Sybil attacks are now investigated by six isolated fields. They are not aware of others. They lack the understanding of how fundamental their problem is. They are not fully aware of all prior and related work

synctext commented 1 year ago

Unified model of trust and intelligence (dream outcome)

Two isolated field are probably much closer aligned than anybody realises. Unbelievable (really) similarities exist between theories of trust and theories of mind. With decades of scientific effort we might unify the Evolutionary connectionism architectures with our continued research into incentives and credit allocation (e.g. MeritRank) within collaboration humans (Our ongoing research for the past 23 years :astonished:). See Evolutionary Connectionism: Algorithmic Principles Underlying the Evolution of Biological Organisation in Evo-Devo, Evo-Eco and Evolutionary Transitions from 2016.

Prof Levin great writing from evolutionary biology: The collective intelligence of evolution and development. Published in the collective intelligence open access Journal he co-founded. Great quotes: Evolution is more intelligent than we realised, the Tribler Lab experimentally determined that "Randomness and evolution are more intelligent than we realised". Its counter-intuitive for scientific engineers that you can't outsmart massive scaled randomness. The relevant section: Section Credit assignment in individuals and collectives Contains this quote: How does scaling of reward dynamics bind subunits into intelligent collectives that better navigate novel problem spaces? Lessons from machine learning. It is no accident that the issue of credit assignment, and the application of credit to parts or wholes, is a central one in evolutionary selection, developmental and organismic biology and cognitive science. It is a feature of many difficult learning tasks that they require sequences of actions that are many steps away from ultimate goals – making it intrinsically difficult to incentivise the component parts involved.

Reward functions appear in biological systems, our collaborative systems based on MeritRank which address freeriding, and also in a state-of-the-art robot recing pilot. See the Nature Journal quote with the reward terms for reinforcement learning with collision avoidance:

_update_: another [fascinating Nature paper studies credits](https://doi.org/10.1038/s41593-023-01514-1). These are micro-level credits, to update weights (see [overview paper](https://arxiv.org/pdf/2403.18929)). The [Github code](https://github.com/YuhangSong/Prospective-Configuration) for backprop alternative. ```For both humans and machines, the essence of learning is to pinpoint which components in its information processing pipeline are responsible for an error in its output, a challenge that is known as ‘credit assignment’. It has long been assumed that credit assignment is best solved by backpropagation, which is also the foundation of modern machine learning. ``` The next step is possible if we ever managed to formulate the above dream theory. Collectively we could develop superhuman capabilities. We "only" need to overcome fraud, greed, and tyranny. **Make our species more intelligent, productive, and resilient by advancing the theory and practice of trust, collaboration, and democracy.**
grimadas commented 10 months ago

Scientific Foundation for Offline Token: