Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.74k stars 445 forks source link

trustworthy gossip: integrating gossip mechanisms with trust calculations #3690

Open synctext opened 6 years ago

synctext commented 6 years ago

This thesis project explores making the connection between gossip and trust.

This work directly ties in with the mission of the lab Distributed Trust Design, see #3571. We are currently implementing incremental personalised pagerank #2805 and IPv8 distributed apps #2943.

This work shall demonstrate in 3 use-cases the linkage of IPv8 community gossip mechanism with trust calculations:

Sprint milestones:

Our #10 operational prototype from 2009 and 2013. Uses random graph walk: web-of-trust-in-Tribler-GUI

synctext commented 6 years ago

IPv8/TCP (Trust Control Protocol). Game Theory direction title would be: using natural selection for the evolution of trust. Even more over-the-top: game theory of trust.

illustration from done by the Prof. Nowak team at Harvard. Their Nature paper. This model is designed for population growth and wrong for trust. But perhaps it can be modified to model the growth of trust. image

qstokkink commented 5 years ago

A first version of the suppress, delete, store, collect, spread protocol has been captured in https://github.com/Tribler/py-ipv8/pull/186. We have found that this roughly relates to the emotion of a user for a given incoming message. afbeelding

synctext commented 5 years ago

(posting here, as it seems the best fit for now)

Any good, this related work? @qstokkink "Practical Homomorphic Encryption Over the Integers" We present novel homomorphic encryption schemes for integer arithmetic, intended for use in secure single-party computation in the cloud. These schemes are capable of securely computing only low degree polynomials homomorphically, but this appears sufficient for most practical applications. In this setting, our schemes lead to practical key and ciphertext sizes. We present a sequence of generalisations of our basic schemes, with increasing levels of security, but decreasing practicality. We have evaluated the first four of these algorithms by computing a low-degree inner product. The timings of these computations are extremely favourable. Finally, we use our ideas to derive a fully homomorphic system, which appears impractical, but can homomorphically evaluate arbitrary Boolean circuits.

qstokkink commented 5 years ago

@synctext not particularly.

DanGraur commented 5 years ago

Gossip Overlay Security Concerns

Task Description

Find vulnerabilities in the Gossip overlay. Try to do the following:

Types of threats

Attacks which should be prevented:

Attacks which should be protected against (cannot be prevented):

Alt Text Alt Text

Miscellaneous observations

synctext commented 5 years ago

Wireguard: related work on security, session cookies, DDoS protections and 4000 lines of beauty.

Getting our layering model clear: IPv8 delivers UDP messages. Also a discovery services, bit of DHT lookup. Should it stay away from sessions and shared state?

Brainstorming... Trustworthy_Gossip() implements the "Gossip Control Protocol" (GCP on top of IPv8). This adds sessions, statefullness, and long-lived shared secrets? Plus Libtorrent-based congestion control and bulk data exchange. Trustchain also has/needs a bulk data transfer protocol?

qstokkink commented 5 years ago

Trustchain also has/needs a bulk data transfer protocol?

If libtorrent channels work out, this could also be a good candidate to sync with libtorrent.

synctext commented 5 years ago

Boostrap idea:

qstokkink commented 5 years ago

Very relevant: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4220960 http://www.ece.neu.edu/fac-ece/ioannidis/static/pdf/2011/IICM.recsys11.pdf https://hal-lirmm.ccsd.cnrs.fr/lirmm-00640735/file/F2Frec-10october.pdf https://www-sciencedirect-com.tudelft.idm.oclc.org/science/article/pii/S0306457316306471

qstokkink commented 5 years ago

Best idea so far:

Use a Polytomous Rasch model (Wikipedia) to reach a decision (SUPPRESS, DELETE, STORE, COLLECT, SPREAD) based on the decisions (SUPPRESS, DELETE, STORE, COLLECT, SPREAD) of others for a specific message.

synctext commented 5 years ago
synctext commented 5 years ago

Title brainstorm: 'middleware for provably trustworthy information flow'.

grimadas commented 5 years ago

Some of the relevant work:

qstokkink commented 5 years ago

Brain dump 06 Mar 2019

Assumptions: - Ledger with consensus - Trust function

Payoff: The more trustworthy you are, the less disk I/O you need to do.

Trade-off: The more trusting you are, the less trustworthy you are (and vice-versa). However, once you have established yourself as trustworthy in the network, you can become more trusting. This means that new nodes will avoid you more as you become more popular, thus load balancing over the network and forming new trust clusters.

Idea The higher your trust score, based on evidence in the ledger, the less evidence you need to provide. Everyone can calculate your trust score, given the complete ledger. To maintain replication guarantees, you cannot decide what information to destroy/stop delivering. One mechanism to ensure this replication, would be the following:


Let h be a hashing function Let H be the domain of h, in bit length Let CSPRNG(iv) be a cryptographically secure pseudo random number generation function with initialization vector iv and domain H Let d be the Hamming distance function Let D be the maximum Hamming distance allowed to maintain replication guarantees for the given network context ( D < H ) Let t be the trust function defined on [0.0, 1.0] for a given public key

algorithm can_drop is

input: Public key Pk input: The information Ii to decide for, where i is the sequence number in the ledger output: Whether or not to drop Ii

if H - d(CSPRNG(h(Pk) || i), h(Ii)) > D + t(Pk) * (H - D) then

return T

else

return F

end if


Note that everyone can also determine whether or not you were allowed to drop information, based on evidence from the ledger and the same dropping logic. Failure to follow the protocol will lead to self-induced ostracisation.

Further note: this mechanism does not have suppression. But it may also not need the suppression primitive.

qstokkink commented 5 years ago

Here it is in function form for easier reading:

afbeelding

synctext commented 5 years ago

Related work called "safe gossip" by Maidsafe team, they did a badly executed, early, few million ICO. Push-pull gossip protocol. They use Scott Shenker approach from 2000. image

We investigate whether such a large communication overhead is inherent to epidemic algorithms.
On the positive side, we show that the communication overhead can be reduced significantly.
We give an algorithm using only O(n ln ln n) transmissions and O(ln n)rounds.
In addition, we prove the robustness of this algorithm, e.g., against adversarial failures
struct Network {
    pool: CpuPool,
    // An mpsc channel sender for each node for giving new client messages to that node.
    message_senders: Vec<mpsc::UnboundedSender<String>>,
    // An mpsc channel receiver for getting the client messages and stats from the nodes.
    stats_receiver: mpsc::UnboundedReceiver<(Id, Vec<String>, Statistics)>,
    // The last set of client messages received via `stats_receiver` for each node.
    received_messages: HashMap<Id, Vec<String>>,
    // The last set of stats received via `stats_receiver` for each node.
    stats: HashMap<Id, Statistics>,
    // The futures for all nodes.  When these return ready, that node has finished running.
    node_futures: Vec<CpuFuture<(), Error>>,
    // All messages sent in the order they were passed in.  Tuple contains the message and the index
    // of the node used to send.
    client_messages: Vec<(String, usize)>,
    // Message which when sent to a node via its `message_sender` indicates to the node that it
    // should terminate.
    termination_message: String,
}
qstokkink commented 5 years ago

The first experiment will be centered around decentralizing Twitter. It is based on a public Twitter data set of a flash crowd, over a one week timespan and for roughly 550k interactions. Here are the first visualizations of that data set and an elementary reciprocity function applied to it.

preview_transactions

preview_reciprocity

Stay tuned for the graphs on replication factor and cache misses over time.

The second experiment will probably use TrustChain.

qstokkink commented 5 years ago

Here is the replication graph after the first 100 hours of data processing (first +- 90 hours were 8 cores @ 3.2 GHz, last +- 20 hours were assisted with another 12 cores @ 3.4 GHz). This contains the first +- 180,000,000 data points (304k users).

afbeelding

As I don't feel like waiting another 400 hours I think I'll just make the graph datapoints a bit coarser.

Note that the graph does get more noisy as the relative amount of trusted peers increases (the [0, 200] interval and the [900, 1000] interval are captured at the same granularity). This is by design, as the probabilistic element of the replication function is starting to become significant to ensure replication.

Here are the raw datapoints: afbeelding

qstokkink commented 5 years ago

First results for deduplication in a completely decentralized setting (CHECO + TrustChain in this case) using PageRank for reputation. First the amount of items that would be stored using CHECO normally:

items

Then with DELUT applied:

replicas

qstokkink commented 5 years ago

afbeelding

These experiments have a fixed 4 tps rate per peer.

Ranking of means (p-value < 10^-6) smallest amount of duplication to largest:

  1. 2 facilitators, 20 peer neighborhood, CHECO + DELUT
  2. 1 facilitator, 20 peer neighborhood, CHECO + DELUT
  3. 2 facilitators, 10 peer neighborhood, CHECO + DELUT
  4. 1 facilitator, 10 peer neighborhood, CHECO + DELUT
  5. 1 facilitator, 10 peer neighborhood, CHECO
  6. 1 facilitator, 20 peer neighborhood, CHECO
  7. 2 facilitators, 20 peer neighborhood, CHECO
  8. 2 facilitators, 10 peer neighborhood, CHECO
synctext commented 4 years ago

Related work: the I2P people also build a gossip overlay (they have 32k nodes). The Network-Database offers:

I2P's netDb is a specialized distributed database, containing just two types of data - router contact information (RouterInfos)
and destination contact information (LeaseSets). Each piece of data is signed by the appropriate party and verified by anyone
 who uses or stores it. In addition, the data has liveliness information within it, allowing irrelevant entries to be dropped, newer
 entries to replace older ones, and protection against certain classes of attack.
The netDb is distributed with a simple technique called "floodfill", where a subset of all routers, called "floodfill routers",
maintains the distributed database.

Floodfill is their networkDatabase update mechanism and seems vulnerable and under-documented.

The floodfill netDb is a simple distributed storage mechanism. The storage algorithm is simple: send the data to the
closest peer that has advertised itself as a floodfill router. When the peer in the floodfill netDb receives a netDb store
from a peer not in the floodfill netDb, they send it to a subset of the floodfill netDb-peers. The peers selected are the
ones closest (according to the XOR-metric) to a specific key.

Not unlike our community numbering they also have identifiers: image

synctext commented 3 years ago

Related work, also for the @alexander-stannat work. "RepuCoin: Your Reputation is Your Power". Show how to "grow" a reputation in mid-life and mature system. Quote:

image

synctext commented 2 years ago

Related work:

synctext commented 1 year ago

Related work: Based on a June 2002 algorithm. Gossip library by Cornell design: "Smudge is a minimalist Go implementation of the "SWIM: scalable weakly-consistent infection-style process group membership protocol" (Scalable Weakly-consistent Infection-style Membership) protocol for cluster node membership, status dissemination, and failure detection developed at Cornell University by Motivala, et al. It isn't a distributed data store in its own right, but rather a framework intended to facilitate the construction of such systems." GITHUB : https://github.com/clockworksoul/smudge
[Based on historical 2001 work "On scalable and efficient distributed failure detectors" Super naive all-to-all. Oops, no security, reputation, spam measures, or trust! (because this was, like, 21 years ago)

synctext commented 7 months ago

2021 related work: BASALT: A Rock-Solid Foundation for Epidemic Consensus Algorithms in Very Large, Very Open Networks. Use also the idea of diversity to fight Sybils. we will consider two paradigmatic scenarios where an attacker attempts a Sybil attack using many IP addresses Complex solution: image Quite clever bit of have dictated randomness with a pseudo-random function. Bit this still gives bias and is not strictly unbiased random. Difficult to predict if this design will actually work in Real NAT-based Internet.


definition by repeatedly exchanging neighbor lists with other peers,
discovering at each step peers that better match its ranking functions.
In the following, we first detail the use of ranking functions to
identify target neighbors. Then we discuss how nodes update these
ranking functions to make the random graph dynami