content popularity community: performance evaluation

synctext commented 6 years ago

For context, the long-term megalomaniac objectives (update Sep 2022):	Layer	Description
User experience	perfect search in 500 ms and asynchronously updated :heavy_check_mark:
Relevance ranking	balance keyword matching and swarm health
Remote search	trustworthy peer which has the swarm info by random probability
Popularity community	distribute the swarm sizes
Torrent checking

After completing the above, next item: Add tagging and update relevance ranking. Towards perfect metadata.
De-duplication of search results.
Also find non-matching info. Search for Linux, find items tagged Linux, Biggest Ubuntu swarm is shown first.
Added to that is adversarial information retrieval for our Web3 search science. After above is deployed and tagging is added. Cryptographic protection of above info. Signed data needs to have overlap with your web-of-trust, unsolved hard problem.
personalised search
3+ years ahead: row bundling

@arvidn indicated: tracking popularity is known to be a hard problem.

I spent some time on this (or a similar) problem at BitTorrent many years ago. We eventually gave
up once we realized how hard the problem was. (specifically, we tried to pass around, via gossip,
which swarms are the most popular. Since the full set of torrents is too large to pass around,
we ended up with feedback loops because the ones that were considered popular early on got
disproportional reach).

Anyway, one interesting aspect that we were aiming for was to create a "weighted" popularity,
based on what your peers in the swarms you participated in thought was popular. in a sense,
"what is popular in your cohort".

We deployed the first version into Tribler #3649 , after prior Master thesis research #2783. However, we lack documentation or specification of the deployed protocol.

Key research questions:

What is the real deployed system behavior?
What is the resource consumption?
What is the accuracy and quality in general of the information?
How can we attack or defend this IPv8 community?

Concrete graphs from a single crawl:

messages and bandwidth in time
hashes discovery and duplicates
distribution of discovered popularity of swarms
conduct swarm popularity check and compare results in real-time
behavior of pub/sub mechanism for popularity feed
dynamics of trust-based pub/sub auto-subscribe

Implementation of on_torrent_health_response(self, source_address, data) ToDo @xoriole : document deployed algorithm in 20+ lines (swarm check algorithm, pub/sub, hash selection algorithm,handshakes, search integration, etc.).

xoriole commented 6 years ago

Popularity Community Introduction Popularity community is a dedicated community to disseminate popular/live contents across the network. The content could be anything for eg. health of a torrent, a list of popular torrents or even search results. The way of dissemination of the content follows the publish-subscribe model. Each peer in the community is both a publisher and a subscriber. A peer subscribes to a set of neighboring peers to receive their content updates while it publishes its content updates to the peers subscribing it. pub-sub

Every peer maintains a list of subscribing and publishing peers with whom it exchanges content. All contents from non-subscribed publishers are basically refused. Selection of peers to subscribe or to publish greatly influences the dissemination of the content both genuine and spam. Therefore, we try to select based on a simple trust score. Trust score indicates the number of times we have interacted with the node as indicated by the number of mutual Trustchain blocks. Higher the trust score better the chance of being selected (as publisher or subscriber).

Research questions ...

synctext commented 5 years ago

ToDo: describe the simplified top-N algorithm that is more light-weight (no pub/sub). As-simple-as-possible gossip. Measure and plot 4 graphs listed above

messages and bandwidth in time
hashes discovery and duplicates
distribution of discovered popularity of swarms
conduct swarm popularity check and compare results in real-time

synctext commented 4 years ago

Bumping this issue. The key selling point of Tribler 7.6 is popularity community maturing (good enough for coming 2 years) and superior keyword search using relevance ranking. goal: 100k swarm tracking.

This has priority on channel improvements. Our process is to bump each critical features to a superior design and move to the next. Key lesson within distributed systems is: you can't get it perfect the first time (unless you have 20 years of failure experience). iteration and relentless improving deployed code is key.

After we this close performance evaluation issue we can build upon it. We need to know how well it performs and tweak it for 100k swarm tracking. We can do 1st version of real-time relevance ranking. Read our 2010 work for background: Improving P2P keyword search by combining .torrent metadata and user preference in a semantic overlay

Repeating key research questions from above (@ichorid):

What is the real deployed system behavior?
What is the resource consumption?
What is the accuracy and quality in general of the information?
How can we attack or defend this IPv8 community?

Concrete graphs from a single crawl:

messages and bandwidth in time
hashes discovery and duplicates
distribution of discovered popularity of swarms
conduct swarm popularity check and compare results in real-time
behavior of pub/sub mechanism for popularity feed
dynamics of trust-based pub/sub auto-subscribe

synctext commented 4 years ago

See also #4256 for BEP33 measurements&discussion

synctext commented 4 years ago

Please check out @grimadas tool for crawling+analysing Trustchain and enhance this for the popularity community: https://github.com/Tribler/trustchain_etl

synctext commented 4 years ago

Hopefully we can soon add the health of the ContentPopularity Community to our overall dashboard.

xoriole commented 4 years ago

Screenshot from 2020-09-13 19-11-03

PEERS_CONNECTED : Number of currently connected peers
PEERS_UNIQUE: Number of unique peers encountered during the measurement period (10 mins)
TORRENTS: Number of torrents received
SEEDERS_MAX: Seeder count of the most popular torrent received
SEEDERS_AVG: Avg seeder count of the received torrents. (Higher the better; how to increase this?)
SEEDERS_ZERO: Number of torrents received with zero seeders

Currently, a peer shares its most popular 5 and random 5 torrents checked by the peer to its connected neighbors. Since, a peer starts sharing them from the beginning, its not always the case the popular torrents are shared. This results in sharing torrents that doesn't have enough seeders (see SEEDERS_ZERO count), and this does not contribute much in sharing of popular torrents. So, two things that could improve sharing popular torrents seems like:

not sharing zero seeder torrents
increasing the initial buffer time before sharing is started

https://jenkins-ci.tribler.org/job/Test_tribler_popularity/plot/

devos50 commented 4 years ago

Nice work! I assume that this experiment is using the live overlay?

As a piece of advice, I would first try to keep the mechanism simple for now, while analyzing the data from the raw network (as you did right now). Extending the mechanism with (arbitrary) rules might lead to bias results, which I learned the hard way when designing the matchmaking mechanism in our decentralized market. Sharing of the 5 popular and 5 random torrents might look like a naive sharing policy, but it might be a solid starting point to get at least a basic popularity gossip system up and running.

Also, we have a DAS5 experiment where popularity scores are gossiped around (which might actually be broken after some channel changes). This might be helpful to test specific changes to the algorithm before deploying them 👍 .

xoriole commented 4 years ago

@devos50 Yes, it is using live overlay.

Also, we have a DAS5 experiment where popularity scores are gossiped around (which might actually be broken after some channel changes). This might be helpful to test specific changes to the algorithm before deploying them.

Yes, good point. I'll create experiments to test the specific changes.

synctext commented 4 years ago

Thnx @xoriole! We now have our first deployment measurement infrastructure, impressive.

[X] What is the real deployed system behaviour?

Can we (@kozlovsky @drew2a @xoriole) come up with a dashboard graph to quantify how far we are to our Key Performance Indicator: the goal of tracking 100k swarms? To kickstart the brainstorm:

[x] Unique hash discovery after joining community for 1 hour (plus duplicates)?
[x] messages and bandwidth in time
- For 1 hour measure the amount of messages and type
- Indicate their contents (identify ZERO_SEEDERS)
- plot bandwidth usage
[x] What is the accuracy and quality in general of the information?
- validate the popularity check results
- sample slowly the discovered swarms (DHT hammering will get server blocked)
- compare client and our Jenkins re-check result
- plot the delta for each swarm and sort by largest difference (Y-Axis delta of popularity, X-Axis: swarms sorted by delta)
[ ] distribution of discovered popularity of swarms (dead swarms, suspicious large swarms)

increasing the initial buffer time before sharing is started

As @devos50 indicated, this sort of tuning is best preserved for last. You want to have an unbiased view of your raw data for as long as possible. Viewing raw data improves accurate understanding. {Very unscientific: we design this gossip stuff with intuition. If we have 100+ million users people would be interested in our design principle.}

Repeating long-term key research questions from above (@ichorid):

What is the resource consumption?
How can we attack or defend this IPv8 community?
3065 Fix for DHT spam using additional deployed service infrastructure

ichorid commented 4 years ago

not sharing zero seeder torrents

For every popular torrent, there are a thousand of dead ones. Therefore, information about what is alive is much more precious and scarce then about what is dead. It will be much more efficient to only share torrents that are well seeded.

Though, the biggest questions are:

should we use the information received from other peers when sending our own gossip packets? (probably not, or to some limited extend)
should we recheck the torrent health eventually? (probably not, as popular torrents tend to rise quickly and fall slowly)

ichorid commented 4 years ago

What is the resource consumption?

3065 Fix for DHT spam using additional deployed service infrastructure

It would be very nice if we find (or develop) some Python-based Mainline DHT implementation, to precisely control the DHT packets parameters.

How can we attack or defend this IPv8 community?

:crossed_swords: attack	:shield: defence
spam stuff around	pull-based gossip
fake data	cross-check data with others
biased torrent selection	pseudo-random infohash selection (e.g. only send infohashes sharing some number of last bytes)

drew2a commented 4 years ago

We still have no measures for "popularity community".

I will describe a few experiments below. Maybe they will be helpful for a developing more scientific approach.

Metric 1: how fast a new user can get the list of popular torrents

Experiment 1.1

Given the network of 100 nodes. Each node has a list of 100K popular torrents. We add a new node to the network.

Question: how long will it take to deliver a 100K list to the new node?

Experiment 1.2:

The same as 1.1, but 1K nodes.

Metric 2: how fast an empty network will collect 100K popular torrents

Experiment 2.1

Given the network of 100 nodes. Each node has an empty list of popular torrents.

Question: how long will it take to get a 100K list on each node?

Additional data: bandwidth consuming over time. Additional data: lists filling over time.

Experiment 2.2:

The same as 2.1, but 1K nodes.

Metric 3. how heterogeneous lists are

Experiment 3

Given: the network, after 2.1 experiment. We calculate the common part of all popular lists (the common part of "100k torrent list" on each node) [hereinafter CommonList]

Question: what size of CommonList (in percent)?

Metric 4: quality of a popular list

Experiment 4

Given: the network after experiment 2.1. We calculate CommonList. We somehow receive a reference list (may be it is a static "human-made" list) [hereinafter ReferenceList] We compare CommonList to the ReferenceList.

Question: Which percentage of the CommonList and ReferenceList is the same?

devos50 commented 4 years ago

@drew2a good suggestions, thanks!

Note that many of these experiments can be performed in isolation on our (nation-wide) compute cluster, the DAS5. We have the necessary tools in Gumby to easily create new overlays on this cluster and to connect peers with each other. Gumby also allows plotting of resource consumption. Hopefully, we will soon have a few additional servers operational to conduct experiments.

We already have a basic DAS5 experiment that starts a few Tribler instances and shares popularity vectors. This would be a good starting point for anyone that quickly wants to evaluate the effectiveness of popularity gossip strategies. However, I believe that this issue concerns a performance evaluation of our live network.

ichorid commented 4 years ago

"Experiment 1.1" is basically the experiment @devos50 made for the first implementation of GigaChannels two year ago. The answer is: "very fast, but gets increasingly slower as the list grows". Also, I doubt there are over 10k popular torrents at any moment in the whole BitTorrent network. Some (dated) insights into the BitTorrent network can be found in @synctext 's seminal works.

There are about 20 popular trackers, each one serving about 0.5-2M torrents. Less than 1% of torrents in any category are "alive". See torrents.csv project. Overall, torrent popularity is pretty transient.

ichorid commented 4 years ago

"Experiment 2" depends on two factors:

the availability of infohashes
the maximum allowed rate of DHT checks per ip/port/client id

AFAIK, one can't simply go to a BitTorrent node and ask it for its list of seeded infohashes ( :eye: :ok_hand:). Instead, one must know the infohash and ask the client about it. Maybe there is some BEP-extension that implements querying clients for lists of infohashes, but that would be a great privacy hole (and thus highly unlikely to be accepted by BitTorrent community).

Also, DHT has some flood-protection that already took a toll on our developers.

synctext commented 4 years ago

Great fan of these concrete experiments to collect hard data. We need performance data from emulation. Hence the dashboard idea.

First priority coming sprint weeks: build backwards-compatible PopularityCommunity and all known fix bugs in there, and try to boost performance. Preferably that performance is improved, even with the existing deployed community.

We thus do integration testing, compatibility testing, regression testing, and performance analysis into the Multi-Aspect Sprint Cycles :-) Big step forward: https://github.com/xoriole/tribler/blob/popularity-helper/src/tribler-core/run_popularity_helper.py

ichorid commented 4 years ago

From this paper

We trace the popularity of those objects by counting the number of requests they receive per week for the entire eight months of our measurement study. Fig. 4 shows that popular objects gain popularity in a relatively short timescale reaching their peak in about 5–10 weeks. The popularity of those objects drops dramatically after that. As the figures show, we observe as much as a sixfold decrease in popularity in a matter of 5–10 weeks.

According to that paper, the content popularity follows Mandelbrot-Zipf distribution.

Unfortunately, I (almost) completely forgot my calculus course, so I can't integrate anymore (except for the simplest stuff). Now, if we would have a mathematician who can integrate the Mandelbrot-Zipf distribution and fit the total number of entries to the already known BitTorrent network stats (see my post above about 40M torrents)... Then we could predict the peak swarm size and tune our experiments accordingly...

@alexander-stannat ?

xoriole commented 4 years ago

Results of popularity community from continuous 1 hour execution. Plots available in Jenkins

ichorid commented 4 years ago

we ended up with feedback loops because the ones that were considered popular early on got disproportional reach

I've come up with a simple algorithm on how to solve the feedback problem. The basic idea is to emulate how news spread trough human society:

share more important thoughts more often.
over the time the urge to share a thought diminishes ("become bored of the idea").
if you hear other people repeating your idea, reduce the urge to share it (the "old news" effect).
if someone's claim contrasts your thoughts to the point it will change your behaviour, check the fact yourself. If the check fails, reject the claim and notify the claimer.

This list of rules guarantees that information about popular torrents will propagate quickly (1), but will not dominate the gossip (3) and die out naturally (2). Also, it will prevent the spreading of "fake news" (4).

EDIT: basically, already invented in "Top-k Item Identification on Dynamic and Distributed Datasets"

synctext commented 3 years ago

Just bumping this issue in importance. We need to fix this community. "Donate my VPN bandwidth to Tribler", that would solve matters. With many IPv4 addresses we can crawl torrents and even join them to check the ground truth. Crawl the contents of channels your joined and gossip.

synctext commented 3 years ago

We started working on this on Feb 6, 2017, see issue #2783. That is over 4 years and 2 months! Ambition level is now reduced. This works only if:

within channels
channel with 10 or maximum 10k torrents
sufficient number of people subscribed to this channel (5 or 50 people for big channels)
These 50 people must test the popularity of 10k swarm on weekly basis.
Other users doing keyword search will find popularity statistics
Next step. Search results of users look good: popular swarm get highly ranked (relevance ranking)
Next Year. Within 2022 we can hopefully work on "adverserial search", something we worked on for years (years before the 2016 issue creation). See my ToDo remarks on https://github.com/Tribler/tribler/issues/2547#issuecomment-661006728

Something is starting to work within channels :heavy_check_mark:

xoriole commented 3 years ago

The graph below shows number of times different popular torrents received by a single Tribler peer in the period of 21 hours via Popularity community. There are over 23k torrents but the graph shows only top 1000. It is a long tail.

Shared torrents distribution - message (in 21 hours)

Top 10 shared torrents: top10-torrents

The graph below shows the number of times different peers shared popular torrents to the observer Tribler peer in the same time duration. Shared torrents distribution - node (in 21hours)

The graph below shows the torrent distribution and difference in the seeder count across all messages. Shared torrents distribution w_ seeder diff%

Out of 23k torrents, majority of torrents shared have zero (or low) seeder count. (Note that these are not dead torrents). There are hundreds of torrents shared thousands of times with the same health (seeder) count by several peers. This information can likely be used to determine the trust of the received torrent health information and/or the sender peer.

Points of discussion:

If several peers share same health information over time, can this health information be trusted? If yes, what could be acceptable criteria
The most popular torrent health information was received over 5k times by 160 peers in 21 hours (almost every 15 seconds). This sharing can be made less aggressive and more inclusive so more torrents are included instead of repeating the same torrents.

synctext commented 3 years ago

Great work! As discussed yesterday: please do not make any radical changes and no Bloom filters. There is a systematic bias to repeating the same torrent. That leads to duplicate information. Do not use any new ideas please. Just remove the bias for big swarms, deploy, and measure resulting improvement. This field is ancient (but all this prior work ignored security, therefore of limited usage. With Trustchain we moved beyond this). Things like "peer sampling" without a web of trust leave your system defenceless against spam or Sybil attack. Feel free to take the time to understand much of the prior work discussed in these 205 slides. Note especially the naive security assumption on slide 100, they test with 2% attackers in the overlay for a "secure peer sampling" paper http://sbrc2010.inf.ufrgs.br/resources/presentations/tutorial/tutorial-montresor.pdf

<rant>The key idea is to keep things as simple as possible. Don’t needlessly complicate things. This is very weird, but optimisation usually lead to complexity. For gossip protocols, when you add complexity you're doing it wrong. You need to think differently, randomness and some repeating create robustness, resilience, and strength. Obviously, identical messages repeating 1000s of times are wrong. Its quite complicated to create simple systems. That is why science failed so far to make re-usable gossip tooling. Tribler is designed to pioneer such simple proven building blocks. Not by starting out with a generic tool, but first make something that works for a million people and evolve in years to come.

xoriole commented 3 years ago

The graphs below show the observation of Popularity Community by a single Tribler peer in the period of ~21 hours. Experiment 1: Current behavior of the community (V7.9.0-RC1) Experiment 2: Updated to use random_walk and remove_peers strategy so that the observer peer can find more peers in the network.

Experiment 1 graphs are the same as the previous comment in this issue

Experiment 1 1 Shared torrents distribution - message (in 21 hours)

Experiment 2 2 Shared torrents distribution - message (in 21 hours)

Experiment 1 1 Shared torrents distribution - node (in 21 hours)

Experiment 2 2 Shared torrents distribution - node (in 21 hours)

Experiment 1 1 Shared torrents distribution w_ seeder diff%

Experiment 2 2 Shared torrents distribution w_ seeder diff%

Observations:

In the second experiment, the observer peer found more peers than experiment 1. This was expected.
In the second experiment, the observer peer found more torrents (39k) compared to experiment 1 (23k). It was expected that more torrents will be discovered but the difference is significant.
In the second experiment, we see that the popular torrents are shared more number of times than in experiment 1. This is inline with finding more peers and receiving the same popular torrents from them.
The effect of change in experiment 2 is limited to the observer node only, it would be interesting to see the network behavior when more nodes participate with the changes included. This will require a separate network experiment.

ichorid commented 3 years ago

The effect of change in experiment 2 is limited to the observer node only, it would be interesting to see the network behavior when more nodes participate with the changes included. This will require a separate network experiment.

I suggest converting Popularity Community to pull-based gossip, RQC-style. That will allow for much easier experimentation and spam-resistance.

Alternatively, stop sending popular torrents altogether, and instead just send random torrents instead. That should flatten the curve.

synctext commented 3 years ago

stop sending popular torrents altogether, and instead just send random torrents instead.

Great idea. Please try to get this deployed for the next release. No changes to any other parts of this community. Lets see how that compares.

xoriole commented 3 years ago

Observation from experiment 3: Current behavior of v7.10-exp1 Important changes:

Updated to use random_walk and remove_peers strategy so that the observer peer can find more peers in the network.
Popular and random torrents are separately shared
Popular torrents are shared every 2 mins
Random torrents are shared every 5 seconds

Graph: x-axis represents the unique torrents received by the observer node

3 Shared torrents distribution - message (in 21 hours)

A few popular torrents were shared a large number of times. Instead of flattening the curve, we obtained a sharper peak at 15k (compared to 5k, 6k in earlier experiments)

3 Shared torrents distribution - node (in 21 hours) The number of peers discovered is high, that is expected considering the change in strategy.

3 Shared torrents distribution w_ seeder diff%

Torrent distribution considering the difference in seeder values is consistent with the previous experiment.

ichorid commented 3 years ago

A few popular torrents were shared a large number of times. Instead of flattening the curve, we obtained a sharper peak at 15k (compared to 5k, 6k in earlier experiments)

Are you running your experiments on the main Tribler network? If so, this is expected because you use push-based gossip, meaning that the only thing that effectively changed for a single host running 7.10 in a sea of 7.9 is the faster peers discovery. Which, indeed, should sharpen the peak.

xoriole commented 3 years ago

Are you running your experiments on the main Tribler network? If so, this is expected because you use push-based gossip, meaning that the only thing that effectively changed for a single host running 7.10 in a sea of 7.9 is the faster peers discovery. Which, indeed, should sharpen the peak.

Yes, it is on the main Tribler network. Having almost 3 times the earlier peak was not something I expected. I think you're right, faster discovery and abundance of v7.9 peers which send combined popular and random torrents message is responsible for the spike. It should decline once there are more v7.10 nodes since the frequency of share for popular torrents is reduced in v7.10.

As an aside, I propose to add a message type to request/respond client version in RemoteQueryCommunity. It'll be useful in the experiments to confirm the distribution.

synctext commented 3 years ago

Popularity community is starting to work nicely. Tribler 7.10 keyword search finds good swarm. :grinning: :partying_face: :grinning:

Yet another scientific challenge for 2022 is reducing the altruistic peer discovery time. Or fork the Bittorrent protocol and only create swarms with proper altruism (seed for 2 years). We see how few peers in swarms actively upload. For instance, this newly swarm has reportedly 375 seeders. It takes typically 60-300 seconds before you find the altruistic seeders: Tribler7 10__375seeders_misterious_shrinking_swarms

synctext commented 2 years ago

In a few days the new 7.12 release will be out! :clap: Please post the latest performance analysis of your algorithms here @xoriole.

synctext commented 2 years ago

New measurement results are hopefully posted here by @xoriole for 6 Sep 2022 Dev meeting :crossed_fingers:

xoriole commented 2 years ago

The graph below shows the received number of torrents (unique & total), total messages and peers discovered per day by the crawler running Popularity Community in observer mode for 95 days. The crawler is running with an extended discovery booster which leads to discovering more torrents.

Popularity Community (95 days) - updated

synctext commented 2 years ago

Comments on this measurement:

Number of peers is roughly 1000 unique peers.
- On 4th september another site saw 2774 peers, why the difference? Release.tribler.org details:
- anonymous version checks
- minimal mechanism for releasing critical bug fix versions
- deliberate, we really do not want to know any public keys at this server
- fallback mechanism is using Github API to check for new versions, but then Microsoft collects logs
- Comparing active IP addresses versus active peers in popularity community
Single Crawler at 1 server using Extended discovery booster
- using default crawler parameters
- boosting time within IPv8 does not end, single community join
- Find peers aggressively in that single community
- Not using the "discovery community"
- Detailed docs: https://github.com/Tribler/tribler/issues/5828
- IPv8 neighborhood_size == 50 or 25 ??
- IPv8 edge_length == 25 ??
Message Torrents (unique) / day
- Parsing the torrent health message, includes sometimes 10 torrents
- Unique info_hash identifies the exact torrent
- Receiving numerous health messages for the same info_hash in the same day
Note that the booster is not designed as a crawler
- maximize peer count at the end of a 30 seconds period, https://github.com/Tribler/tribler/blob/912c6f0ab95f30be550067f9778db1df1df18ac9/src/tribler/core/components/ipv8/discovery_booster.py#L53-L56
- Re-using the fast discovery mechanism to discover channels faster
Next week, ground truth in Frozen experiment

drew2a commented 2 years ago

@xoriole please, correct be if I'm wrong. In the case the crawler use the default DicoveryBooster, neighborhood_size should be equal to 25 and edge_length should be equal to 25.

https://github.com/Tribler/tribler/blob/912c6f0ab95f30be550067f9778db1df1df18ac9/src/tribler/core/components/ipv8/discovery_booster.py#L53-L56

xoriole commented 2 years ago

Frozen experiment

Torrents (infohash, reported seeders, reported leechers) are extracted for the search results
A libtorrent session is created and each torrent is added to the session one by one to join the swarm for a minute. Then the number of seeders and leechers is extracted from the swarm. (every second the seeders are recorded from Libtorrent)
Dramatic inaccurate results: 2 seeds in Libtorrent versus 83 according to the remote search
The source of the remote query information is the torrent checker with: 1) BEP33, 2) joining swarm, or 3) central tracker

Seeders (reported and checked)

Same measurement, but now reporting leechers instead of seeders. Absolute number of leechers, using a standard linear scale:

Leechers (reported and checked)

Now we represent the same number in terms of percentage in an attempt to normalize the values.

Seeders % = ( reported seeders / checked seeders ) x 100 %
Leechers % = ( reported leechers / checked leechers ) x 100 %
Peers % = ( (reported seeders + reported leechers)  / (checked seeders + checked leechers) ) x 100 %

Peers and seeders (2) Peers and leechers (2)

Observations

Checked seeder values are much lower than the reported seeder values with an average value of 9.42%.
Checked leecher values are much higher than the reported values with an average value of 225.17%.
If seeder and leechers are summed up to get total peers, the checked peer values are still lower than the reported peer values. An average checked peer value is 35.26% of the reported peer number.

synctext commented 2 years ago

Note that remote results and popularity community differ in algorithm. BEP33 and central swarms are simply not sufficiently reliable for a robust, attack-resilient, and quality product that we as scientists strive for. The problem we are trying to solve is not accurate statistics, but just "bad swarm", versus "good swarm". We need more experiments around Libtorrent join stats.

:skull: 0 seeders, dead swarm
:zombie: 1 or 2 seeders, nearly dead zombie swarm
:stopwatch: 3-7 seeders, slooooow swarm
:racehorse: 8-20, decent!
:airplane: 20+ solid performance

Next sprint: understand popularity community ground truth?

xoriole commented 2 years ago

Popularity community experiment The purpose of the experiment is to see how the torrent health information received via the popularity community differs when checked locally by joining the swarm.

From the popularity community, we constantly receive a set of tuples (infohash, seeders, leechers, last_checked) representing the popular torrent with their health (seeders, leechers) information. This health information is supposed to be obtained by the sender by checking the torrent themselves so the expectation is that the information is relatively accurate and fresh.

In the graph below, we show how the reported (or received) health info and checked health info differ for the 24 popular torrents received via the community.

First considering the seeders. Since the variation in the number of seeders for different torrents is high, a logarithmic scale is used. Sept - Seeders (reported and checked)

Similarly for the leechers, again logarithmic scale is used. Sept - Leechers (reported and checked)

Here each individual torrent is unrelated to each other and could be more or less popular depending on what content they represent so seeders, leechers, and peers (= seeders + leechers) are represented in the percentage of their reported value in an attempt to normalize them.

Seeders % = ( checked seeders / reported seeders ) x 100 %
Leechers % = ( checked leechers / reported leechers ) x 100 %
Peers % = ( ( checked seeders + checked leechers) / ( reported seeders + reported leechers ) ) x 100 %

Peers and seeders

Peers and leechers

Observerations

Similar to frozen experiment, the checked seeders values are much lower than the reported seeders values. However, the average seeders % is 13.60% which is a bit higher than the frozen experiment average seeder value (9.42%). This makes sense because since these are popular torrents the seeders are expected to be higher than normal search experiments like frozen.
Checked leechers values are normally higher than the reported leecher values. This also resembles with frozen experiment with similar average values 296.44% (this experiment) and 225.17% (frozen experiment).
Comparing the peer values, the average peers % is 28.27% compared to the frozen experiment 35.26%. This is likely because the total peers reported for popular torrents is higher than for the torrents returned from the search results.
In overall, the percentages do not differ by too much in both experiments.

synctext commented 2 years ago

Writing down our objectives here:	Layer	Description
Relevance ranking	It is show to the user within 500 ms and asynchronously updated
Remote search	trustworthy peer which has the swarm info by random probability
Popularity community	distribute the swarm sizes
Torrent checking

Add tagging and update relevance ranking. Towards perfect metadata.
Added to that is adversarial information retrieval for our Web3 search science. After above is deployed and tagging is added. Cryptographic protection of above info. Signed data needs to have overlap with your web-of-trust, unsolved hard problem.

background Getting this all to work is similar to making a distributed Google. Everything needs to work and needs to work together. Already in 2017 we tried to find the ground-truth on the perfect matching swarm for a query. We have a minimal swarm crawler (2017). "Roughly 15-KByte-ish of cost for sampling a swarm (also receive bytes?). Uses magnet links only. 160 Ubuntu swarms crawled": Documented torrent checking algorithm? Documented popularity community torrent selection and UDP/IPv8 packet format? Readthedocs Example "latest/search_architecture.html"

synctext commented 1 year ago

Initial documentation of deployed Tribler 7.12 algorithms

Random Torrentchecking. Every 2 minutes check popularity of a random swarm. Critical decisions: which swarm to check (e.g. random). No bias for dead swarm or fresh swarms in any way.
Popular Torrentchecking https://github.com/Tribler/tribler/blob/7301695c0f99c68b0e7c84180f2856822bba917f/src/tribler/core/components/torrent_checker/torrent_checker/torrent_checker.py#L198
Unknown quality of dead swarms :skull:. The cause could be channels, dead swarms inside subscribed channels. Concept of pre-view torrent might be the cause. Remote search results are also checked.
No algorithms to purge (remove) dead swarms in Tribler.
{repeating} Redo experiments with newer Tribler code: "The purpose of the experiment is to see how the torrent health information received via the popularity community differs when checked locally by joining the swarm."
- "Naked Libtorrent" is operational as a minimal codebase to join swarms through exit node, measure connected peers, and estimate total swarm size. Strictly limited to 60 seconds per swarm.
- Bep-33 swarm count (not used)
- Full DHT lookup peer identities (not used)
- Tracker peer identities (not used, infohash only)
- PEX-gossip peer identities
  - failed to connect peers
  - Connected peer identities (e.g. responsive)

xoriole commented 1 year ago

Repeating the Popularity community experiment here.

Similar to the experiment done in September, here we show how the reported (or received) health info and checked health info differ for the 24 popular torrents received via the community.

The numbers represented in the graph are count values and the scale used in the graph is logarithmic for better comparison since the variation in the values is large.

A. Based on count

Dec-Seeders (reported and checked)

Dec-Leechers (reported and checked)

B. Normalized in percentages

Seeders % = ( checked seeders / reported seeders ) x 100 %
Leechers % = ( checked leechers / reported leechers ) x 100 %
Peers % = ( ( checked seeders + checked leechers) / ( reported seeders + reported leechers ) ) x 100 %

Dec - Peers and seeders Dec - Peers and leechers

Observerations

Seeders The average number of checked seeders per torrent is similar for both measurements.

Measurement Avg Seeders count Avg Seeders %

Septemebr 104 13.6

December 108 2.49
Leechers The average number of checked leechers is lower than found in September. Measurement Avg Leechers count Avg Leechers %

Septemebr 143.58 296.44

December 105.91 40.02

However, this is less significant since we're more interested.Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.
Peers The average number of checked leechers is lower than found in September. Measurement Avg Peers count Avg Peers %

Septemebr 244.04 28.27

December 214.45 4.35
In overall, the seeders, leechers, and peers percentage has decreased significantly compared to the September measurement. One likely explanation for this change is that the popular torrents recorded in this experiment have a lower standard deviation of the reported values of health information (new version of Tribler) compared to that of the measurements taken in September. That is, more diverse popular torrents are being distributed in the new version of Tribler 7.12.1. This is different from the earlier Tribler version where we observed a few popular torrents were distributed multiple times.
Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.
In my opinion, the dissemination of popular torrents via the popularity community is satisfactory looking at the results.

Measurement	Avg Seeders count	Avg Seeders %
Septemebr	104	13.6
December	108	2.49

Leechers The average number of checked leechers is lower than found in September. Measurement	Avg Leechers count	Avg Leechers %
Septemebr	143.58	296.44
December	105.91	40.02

Peers The average number of checked leechers is lower than found in September. Measurement	Avg Peers count	Avg Peers %
Septemebr	244.04	28.27
December	214.45	4.35

absolutep commented 1 year ago

In overall, the seeders, leechers, and peers percentage has decreased significantly compared to the September measurement.

I would point out another reason - lower number of users using Tribler might skew the results or atleast give an erratic response.

I do not know why but it seems that userbase has decreased quite a lot.

For newer torrents, I get downloading/uploading speeds of around 20MBPS in qBitTorrent (without VPN) but on Tribler I hardly cross maximum of 4MBPS (without hops).

Is this because of low number of users or unable to connect to peers or cooperative downloading - that I have no technical knowledge of?

synctext commented 1 year ago

@absolutep Interesting thought, thx! We need to measure that and compensate for that.

@xoriole The final goal of this work is to either write or contribute the technical content to a (technical/scientific) paper, like: https://github.com/Tribler/tribler/files/10186800/LTR_Thesis_v1.1.pdf We're very much not ready for machine learning. But for publication results its strangely easy to mix measurement of a 17 years deployed system with simplistic Python Jupyter notebooks with machine learning. Key performance indicator: zombies in top-N (1000). Agree with key point you raised: stepping out of the engineering mindset. Basically we're spreading data nicely and fast, its only a bit wrong (e.g. 296.44% :joy: ) Lesson learned: started simple, working, and inaccurate. Evolved complexity: we need a filter step and measure again later in time (e.g. re-measure, re-confirm popularity). Reactive, pro-active, or emergent design. Zero trust architecture: trust nobody but yourself. We have no idea actually. ~~So just build, deploy, and watch what happens.~~ Actually we need to know the root cause of failure. Without understanding the reason for wrong statistics, we're getting nowhere. Can we reproduce the BEP33 error, for instance? Therefore, analysis of 1 month system dynamics and faults. Scientific related work (small sample from this blog on Google Youtube): Scientific problem is item ranking. What would be interesting to know is: how fast does the frontpage of Youtube change with the most-popular videos? Scientific article by Google: Deep Neural Networks for YouTube Recommendations.

synctext commented 1 year ago

Discussed progress, next sprint: how good is are the popularity statistics with latest 12.1 Tribler (filtered results, compared to ground truth)? DHT self-attack issue to investigate next?

xoriole commented 1 year ago

Comparing the results from the naked libtorrent and the Tribler, I found that the results of the torrent check of the popular torrents received via the popularity community when checked locally results in dead torrents which is likely not the case. This is because of the issue in torrent checker (DHT Session checker). After BEP33 is removed, the earlier way of getting the health response mostly returns in zero seeders and zero or some leechers, this in the UI shows as

drew2a commented 1 year ago

Could this bug (https://github.com/Tribler/tribler/issues/6131) relate to the described issues?

xoriole commented 1 year ago

Could this bug (#6131) relate to the described issues?

Yes, it is same bug

drew2a commented 1 year ago

While working on https://github.com/Tribler/tribler/pull/7286 I've found a strange behavior that may shed light on some of the other oddities.

If TorrentChecker performs a check via a tracker, then returned values always look ok-ish (like 'seeders': 10, 'leechers': 77).

If TorrentChecker performs a check via DHT, then returned seeders are always equal to 0 (like 'seeders': 0, 'leechers': 56)

Maybe it is a bug that @xoriole describes above.

UPDATED 03.02.22 after verification from @kozlovsky

I also found that one automatic check in TorrentChecker was broken. ~~I also have found that literally all automatic checks in TorrentChecker were broken.~~

There are three automatic checks: https://github.com/Tribler/tribler/blob/87916f705eb7e52da828a14496b02db8d61ed5e9/src/tribler/core/components/torrent_checker/torrent_checker/torrent_checker.py#L72-L75

The first (check_random_tracker) is broken because it performs the check, but didn't save the results into DB:

https://github.com/Tribler/tribler/blob/87916f705eb7e52da828a14496b02db8d61ed5e9/src/tribler/core/components/torrent_checker/torrent_checker/torrent_checker.py#L159-L163

~~The second (check_local_torrents) is broken because it calls an async function in a sync way (which doesn't leads to the execution of the called function).~~

~~The third (check_torrents_in_user_channel) is also broken because it calls an async function in a sync way (which doesn't leads to the execution of the called function).~~

CC: @kozlovsky

drew2a commented 1 year ago

Also, I'm posting an algorithm example of getting seeders' and leechers' in case there is more than one source of information available.

TorrentChecker checks the seeders' and leechers' for an infohash.
TorrentChecker sends a DHT request and a request to a tracker.
TorrentChecker receives two answers. One from DHT and one from the tracker:
- DHT_response= {"seeders": 10, "leechers"=23}
- tracker_response={"seeders": 4, "leechers"=37})
TorrentChecker picks the answer with the maximum seeders' value. Therefore the result is:
- result={"seeders": 10, "leechers"=23}
TorrentChecker saves this information to the DB (and propagates it through PopularityCommunity later).

Proof: https://github.com/Tribler/tribler/blob/87916f705eb7e52da828a14496b02db8d61ed5e9/src/tribler/core/components/torrent_checker/torrent_checker/torrent_checker.py#L320-L324

Intuitively it is not the correct algorithm. Maybe we should use the mean function instead of the max.

Something like:

from statistics import mean

DHT_response = {'seeders': 10, 'leechers': 23}
tracker_response = {'seeders': 4, 'leechers': 37}

result = {'seeders': None, 'leechers': None}
for key in result.keys():
    result[key] = mean({DHT_response[key], tracker_response[key]})

print(result)  # {'seeders': 7, 'leechers': 30}

Or we might prioritize the sources. Let's say:

Tracker (more important)
DHT (less important)

Tribler / tribler