Open synctext opened 6 years ago
Popularity Community Introduction Popularity community is a dedicated community to disseminate popular/live contents across the network. The content could be anything for eg. health of a torrent, a list of popular torrents or even search results. The way of dissemination of the content follows the publish-subscribe model. Each peer in the community is both a publisher and a subscriber. A peer subscribes to a set of neighboring peers to receive their content updates while it publishes its content updates to the peers subscribing it.
Every peer maintains a list of subscribing and publishing peers with whom it exchanges content. All contents from non-subscribed publishers are basically refused. Selection of peers to subscribe or to publish greatly influences the dissemination of the content both genuine and spam. Therefore, we try to select based on a simple trust score. Trust score indicates the number of times we have interacted with the node as indicated by the number of mutual Trustchain blocks. Higher the trust score better the chance of being selected (as publisher or subscriber).
Research questions ...
ToDo: describe the simplified top-N algorithm that is more light-weight (no pub/sub). As-simple-as-possible gossip. Measure and plot 4 graphs listed above
Bumping this issue. The key selling point of Tribler 7.6 is popularity community maturing (good enough for coming 2 years) and superior keyword search using relevance ranking. goal: 100k swarm tracking.
This has priority on channel improvements. Our process is to bump each critical features to a superior design and move to the next. Key lesson within distributed systems is: you can't get it perfect the first time (unless you have 20 years of failure experience). iteration and relentless improving deployed code is key.
After we this close performance evaluation issue we can build upon it. We need to know how well it performs and tweak it for 100k swarm tracking. We can do 1st version of real-time relevance ranking. Read our 2010 work for background: Improving P2P keyword search by combining .torrent metadata and user preference in a semantic overlay
Repeating key research questions from above (@ichorid):
Concrete graphs from a single crawl:
See also #4256 for BEP33 measurements&discussion
Please check out @grimadas tool for crawling+analysing Trustchain and enhance this for the popularity community: https://github.com/Tribler/trustchain_etl
Hopefully we can soon add the health of the ContentPopularity Community to our overall dashboard.
Currently, a peer shares its most popular 5 and random 5 torrents checked by the peer to its connected neighbors. Since, a peer starts sharing them from the beginning, its not always the case the popular torrents are shared. This results in sharing torrents that doesn't have enough seeders (see SEEDERS_ZERO count), and this does not contribute much in sharing of popular torrents. So, two things that could improve sharing popular torrents seems like:
https://jenkins-ci.tribler.org/job/Test_tribler_popularity/plot/
Nice work! I assume that this experiment is using the live overlay?
As a piece of advice, I would first try to keep the mechanism simple for now, while analyzing the data from the raw network (as you did right now). Extending the mechanism with (arbitrary) rules might lead to bias results, which I learned the hard way when designing the matchmaking mechanism in our decentralized market. Sharing of the 5 popular and 5 random torrents might look like a naive sharing policy, but it might be a solid starting point to get at least a basic popularity gossip system up and running.
Also, we have a DAS5 experiment where popularity scores are gossiped around (which might actually be broken after some channel changes). This might be helpful to test specific changes to the algorithm before deploying them 👍 .
@devos50 Yes, it is using live overlay.
Also, we have a DAS5 experiment where popularity scores are gossiped around (which might actually be broken after some channel changes). This might be helpful to test specific changes to the algorithm before deploying them.
Yes, good point. I'll create experiments to test the specific changes.
Thnx @xoriole! We now have our first deployment measurement infrastructure, impressive.
Can we (@kozlovsky @drew2a @xoriole) come up with a dashboard graph to quantify how far we are to our Key Performance Indicator: the goal of tracking 100k swarms? To kickstart the brainstorm:
increasing the initial buffer time before sharing is started
As @devos50 indicated, this sort of tuning is best preserved for last. You want to have an unbiased view of your raw data for as long as possible. Viewing raw data improves accurate understanding. {Very unscientific: we design this gossip stuff with intuition. If we have 100+ million users people would be interested in our design principle.}
Repeating long-term key research questions from above (@ichorid):
- not sharing zero seeder torrents
For every popular torrent, there are a thousand of dead ones. Therefore, information about what is alive is much more precious and scarce then about what is dead. It will be much more efficient to only share torrents that are well seeded.
Though, the biggest questions are:
- What is the resource consumption?
3065 Fix for DHT spam using additional deployed service infrastructure
It would be very nice if we find (or develop) some Python-based Mainline DHT implementation, to precisely control the DHT packets parameters.
- How can we attack or defend this IPv8 community?
:crossed_swords: attack | :shield: defence |
---|---|
spam stuff around | pull-based gossip |
fake data | cross-check data with others |
biased torrent selection | pseudo-random infohash selection (e.g. only send infohashes sharing some number of last bytes) |
We still have no measures for "popularity community".
I will describe a few experiments below. Maybe they will be helpful for a developing more scientific approach.
Given the network of 100
nodes.
Each node has a list of 100K
popular torrents.
We add a new node to the network.
Question: how long will it take to deliver a 100K
list to the new node?
The same as 1.1
, but 1K
nodes.
Given the network of 100
nodes.
Each node has an empty list of popular torrents.
Question: how long will it take to get a 100K
list on each node?
Additional data: bandwidth consuming over time. Additional data: lists filling over time.
The same as 2.1
, but 1K
nodes.
Given: the network, after 2.1
experiment.
We calculate the common part of all popular lists (the common part of "100k torrent list" on each node) [hereinafter CommonList
]
Question: what size of CommonList
(in percent)?
Given: the network after experiment 2.1
.
We calculate CommonList
.
We somehow receive a reference list (may be it is a static "human-made" list) [hereinafter ReferenceList
]
We compare CommonList
to the ReferenceList
.
Question: Which percentage of the CommonList
and ReferenceList
is the same?
@drew2a good suggestions, thanks!
Note that many of these experiments can be performed in isolation on our (nation-wide) compute cluster, the DAS5. We have the necessary tools in Gumby to easily create new overlays on this cluster and to connect peers with each other. Gumby also allows plotting of resource consumption. Hopefully, we will soon have a few additional servers operational to conduct experiments.
We already have a basic DAS5 experiment that starts a few Tribler instances and shares popularity vectors. This would be a good starting point for anyone that quickly wants to evaluate the effectiveness of popularity gossip strategies. However, I believe that this issue concerns a performance evaluation of our live network.
"Experiment 1.1" is basically the experiment @devos50 made for the first implementation of GigaChannels two year ago. The answer is: "very fast, but gets increasingly slower as the list grows". Also, I doubt there are over 10k popular torrents at any moment in the whole BitTorrent network. Some (dated) insights into the BitTorrent network can be found in @synctext 's seminal works.
There are about 20 popular trackers, each one serving about 0.5-2M torrents. Less than 1% of torrents in any category are "alive". See torrents.csv
project. Overall, torrent popularity is pretty transient.
"Experiment 2" depends on two factors:
AFAIK, one can't simply go to a BitTorrent node and ask it for its list of seeded infohashes ( :eye: :ok_hand:). Instead, one must know the infohash and ask the client about it. Maybe there is some BEP-extension that implements querying clients for lists of infohashes, but that would be a great privacy hole (and thus highly unlikely to be accepted by BitTorrent community).
Also, DHT has some flood-protection that already took a toll on our developers.
Great fan of these concrete experiments to collect hard data. We need performance data from emulation. Hence the dashboard idea.
First priority coming sprint weeks: build backwards-compatible PopularityCommunity and all known fix bugs in there, and try to boost performance. Preferably that performance is improved, even with the existing deployed community.
We thus do integration testing, compatibility testing, regression testing, and performance analysis into the Multi-Aspect Sprint Cycles :-) Big step forward: https://github.com/xoriole/tribler/blob/popularity-helper/src/tribler-core/run_popularity_helper.py
From this paper
We trace the popularity of those objects by counting the number of requests they receive per week for the entire eight months of our measurement study. Fig. 4 shows that popular objects gain popularity in a relatively short timescale reaching their peak in about 5–10 weeks. The popularity of those objects drops dramatically after that. As the figures show, we observe as much as a sixfold decrease in popularity in a matter of 5–10 weeks.
According to that paper, the content popularity follows Mandelbrot-Zipf distribution.
Unfortunately, I (almost) completely forgot my calculus course, so I can't integrate anymore (except for the simplest stuff). Now, if we would have a mathematician who can integrate the Mandelbrot-Zipf distribution and fit the total number of entries to the already known BitTorrent network stats (see my post above about 40M torrents)... Then we could predict the peak swarm size and tune our experiments accordingly...
@alexander-stannat ?
Results of popularity community from continuous 1 hour execution. Plots available in Jenkins
we ended up with feedback loops because the ones that were considered popular early on got disproportional reach
I've come up with a simple algorithm on how to solve the feedback problem. The basic idea is to emulate how news spread trough human society:
This list of rules guarantees that information about popular torrents will propagate quickly (1), but will not dominate the gossip (3) and die out naturally (2). Also, it will prevent the spreading of "fake news" (4).
EDIT: basically, already invented in "Top-k Item Identification on Dynamic and Distributed Datasets"
Just bumping this issue in importance. We need to fix this community. "Donate my VPN bandwidth to Tribler", that would solve matters. With many IPv4 addresses we can crawl torrents and even join them to check the ground truth. Crawl the contents of channels your joined and gossip.
We started working on this on Feb 6, 2017, see issue #2783. That is over 4 years and 2 months! Ambition level is now reduced. This works only if:
Something is starting to work within channels :heavy_check_mark:
The graph below shows number of times different popular torrents received by a single Tribler peer in the period of 21 hours via Popularity community. There are over 23k torrents but the graph shows only top 1000. It is a long tail.
Top 10 shared torrents:
The graph below shows the number of times different peers shared popular torrents to the observer Tribler peer in the same time duration.
The graph below shows the torrent distribution and difference in the seeder count across all messages.
Out of 23k torrents, majority of torrents shared have zero (or low) seeder count. (Note that these are not dead torrents). There are hundreds of torrents shared thousands of times with the same health (seeder) count by several peers. This information can likely be used to determine the trust of the received torrent health information and/or the sender peer.
Points of discussion:
Great work! As discussed yesterday: please do not make any radical changes and no Bloom filters. There is a systematic bias to repeating the same torrent. That leads to duplicate information. Do not use any new ideas please. Just remove the bias for big swarms, deploy, and measure resulting improvement. This field is ancient (but all this prior work ignored security, therefore of limited usage. With Trustchain we moved beyond this). Things like "peer sampling" without a web of trust leave your system defenceless against spam or Sybil attack. Feel free to take the time to understand much of the prior work discussed in these 205 slides. Note especially the naive security assumption on slide 100, they test with 2% attackers in the overlay for a "secure peer sampling" paper http://sbrc2010.inf.ufrgs.br/resources/presentations/tutorial/tutorial-montresor.pdf
<rant>
The key idea is to keep things as simple as possible. Don’t needlessly complicate things. This is very weird, but optimisation usually lead to complexity. For gossip protocols, when you add complexity you're doing it wrong. You need to think differently, randomness and some repeating create robustness, resilience, and strength. Obviously, identical messages repeating 1000s of times are wrong. Its quite complicated to create simple systems. That is why science failed so far to make re-usable gossip tooling. Tribler is designed to pioneer such simple proven building blocks. Not by starting out with a generic tool, but first make something that works for a million people and evolve in years to come.
The graphs below show the observation of Popularity Community by a single Tribler peer in the period of ~21 hours.
Experiment 1: Current behavior of the community (V7.9.0-RC1)
Experiment 2: Updated to use random_walk
and remove_peers
strategy so that the observer peer can find more peers in the network.
Experiment 1 graphs are the same as the previous comment in this issue
Experiment 1
Experiment 2
Experiment 1
Experiment 2
Experiment 1
Experiment 2
Observations:
The effect of change in experiment 2 is limited to the observer node only, it would be interesting to see the network behavior when more nodes participate with the changes included. This will require a separate network experiment.
I suggest converting Popularity Community to pull-based gossip, RQC-style. That will allow for much easier experimentation and spam-resistance.
Alternatively, stop sending popular torrents altogether, and instead just send random torrents instead. That should flatten the curve.
stop sending popular torrents altogether, and instead just send random torrents instead.
Great idea. Please try to get this deployed for the next release. No changes to any other parts of this community. Lets see how that compares.
Observation from experiment 3: Current behavior of v7.10-exp1
Important changes:
Graph: x-axis represents the unique torrents received by the observer node
A few popular torrents were shared a large number of times. Instead of flattening the curve, we obtained a sharper peak at 15k (compared to 5k, 6k in earlier experiments)
The number of peers discovered is high, that is expected considering the change in strategy.
Torrent distribution considering the difference in seeder values is consistent with the previous experiment.
A few popular torrents were shared a large number of times. Instead of flattening the curve, we obtained a sharper peak at 15k (compared to 5k, 6k in earlier experiments)
Are you running your experiments on the main Tribler network? If so, this is expected because you use push-based gossip, meaning that the only thing that effectively changed for a single host running 7.10 in a sea of 7.9 is the faster peers discovery. Which, indeed, should sharpen the peak.
Are you running your experiments on the main Tribler network? If so, this is expected because you use push-based gossip, meaning that the only thing that effectively changed for a single host running 7.10 in a sea of 7.9 is the faster peers discovery. Which, indeed, should sharpen the peak.
Yes, it is on the main Tribler network. Having almost 3 times the earlier peak was not something I expected. I think you're right, faster discovery and abundance of v7.9 peers which send combined popular and random torrents message is responsible for the spike. It should decline once there are more v7.10 nodes since the frequency of share for popular torrents is reduced in v7.10.
As an aside, I propose to add a message type to request/respond client version in RemoteQueryCommunity. It'll be useful in the experiments to confirm the distribution.
Popularity community is starting to work nicely. Tribler 7.10 keyword search finds good swarm. :grinning: :partying_face: :grinning:
Yet another scientific challenge for 2022 is reducing the altruistic peer discovery time. Or fork the Bittorrent protocol and only create swarms with proper altruism (seed for 2 years). We see how few peers in swarms actively upload. For instance, this newly swarm has reportedly 375 seeders. It takes typically 60-300 seconds before you find the altruistic seeders:
In a few days the new 7.12 release will be out! :clap: Please post the latest performance analysis of your algorithms here @xoriole.
New measurement results are hopefully posted here by @xoriole for 6 Sep 2022 Dev meeting :crossed_fingers:
The graph below shows the received number of torrents (unique & total), total messages and peers discovered per day by the crawler running Popularity Community in observer mode for 95 days. The crawler is running with an extended discovery booster which leads to discovering more torrents.
Comments on this measurement:
neighborhood_size
== 50 or 25 ??edge_length
== 25 ??Torrents (unique) / day
info_hash
identifies the exact torrentmaximize peer count at the end of a 30 seconds period
, https://github.com/Tribler/tribler/blob/912c6f0ab95f30be550067f9778db1df1df18ac9/src/tribler/core/components/ipv8/discovery_booster.py#L53-L56@xoriole please, correct be if I'm wrong.
In the case the crawler use the default DicoveryBooster
, neighborhood_size
should be equal to 25 and edge_length
should be equal to 25.
Frozen experiment
Same measurement, but now reporting leechers instead of seeders. Absolute number of leechers, using a standard linear scale:
Now we represent the same number in terms of percentage in an attempt to normalize the values.
Seeders % = ( reported seeders / checked seeders ) x 100 %
Leechers % = ( reported leechers / checked leechers ) x 100 %
Peers % = ( (reported seeders + reported leechers) / (checked seeders + checked leechers) ) x 100 %
Observations
9.42%
.225.17%
.Note that remote results and popularity community differ in algorithm. BEP33 and central swarms are simply not sufficiently reliable for a robust, attack-resilient, and quality product that we as scientists strive for. The problem we are trying to solve is not accurate statistics, but just "bad swarm", versus "good swarm". We need more experiments around Libtorrent join stats.
Next sprint: understand popularity community ground truth?
Popularity community experiment The purpose of the experiment is to see how the torrent health information received via the popularity community differs when checked locally by joining the swarm.
From the popularity community, we constantly receive a set of tuples (infohash, seeders, leechers, last_checked)
representing the popular torrent with their health (seeders, leechers) information. This health information is supposed to be obtained by the sender by checking the torrent themselves so the expectation is that the information is relatively accurate and fresh.
In the graph below, we show how the reported (or received) health info and checked health info differ for the 24 popular torrents received via the community.
First considering the seeders. Since the variation in the number of seeders for different torrents is high, a logarithmic scale is used.
Similarly for the leechers, again logarithmic scale is used.
Here each individual torrent is unrelated to each other and could be more or less popular depending on what content they represent so seeders, leechers, and peers (= seeders + leechers) are represented in the percentage of their reported value in an attempt to normalize them.
Seeders % = ( checked seeders / reported seeders ) x 100 %
Leechers % = ( checked leechers / reported leechers ) x 100 %
Peers % = ( ( checked seeders + checked leechers) / ( reported seeders + reported leechers ) ) x 100 %
Observerations
13.60%
which is a bit higher than the frozen experiment average seeder value (9.42%
). This makes sense because since these are popular torrents the seeders are expected to be higher than normal search experiments like frozen.296.44%
(this experiment) and 225.17%
(frozen experiment).28.27%
compared to the frozen experiment 35.26%
. This is likely because the total peers reported for popular torrents is higher than for the torrents returned from the search results.Writing down our objectives here: | Layer | Description |
---|---|---|
Relevance ranking | It is show to the user within 500 ms and asynchronously updated | |
Remote search | trustworthy peer which has the swarm info by random probability | |
Popularity community | distribute the swarm sizes | |
Torrent checking |
background Getting this all to work is similar to making a distributed Google. Everything needs to work and needs to work together. Already in 2017 we tried to find the ground-truth on the perfect matching swarm for a query. We have a minimal swarm crawler (2017). "Roughly 15-KByte-ish of cost for sampling a swarm (also receive bytes?). Uses magnet links only. 160 Ubuntu swarms crawled": Documented torrent checking algorithm? Documented popularity community torrent selection and UDP/IPv8 packet format? Readthedocs Example "latest/search_architecture.html"
Initial documentation of deployed Tribler 7.12 algorithms
Repeating the Popularity community experiment here.
Similar to the experiment done in September, here we show how the reported (or received) health info and checked health info differ for the 24 popular torrents received via the community.
The numbers represented in the graph are count values and the scale used in the graph is logarithmic for better comparison since the variation in the values is large.
Seeders % = ( checked seeders / reported seeders ) x 100 %
Leechers % = ( checked leechers / reported leechers ) x 100 %
Peers % = ( ( checked seeders + checked leechers) / ( reported seeders + reported leechers ) ) x 100 %
Seeders The average number of checked seeders per torrent is similar for both measurements.
Measurement | Avg Seeders count | Avg Seeders % |
---|---|---|
Septemebr | 104 | 13.6 |
December | 108 | 2.49 |
Leechers The average number of checked leechers is lower than found in September. Measurement | Avg Leechers count | Avg Leechers % |
---|---|---|
Septemebr | 143.58 | 296.44 |
December | 105.91 | 40.02 |
However, this is less significant since we're more interested.Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.
Peers The average number of checked leechers is lower than found in September. Measurement | Avg Peers count | Avg Peers % |
---|---|---|
Septemebr | 244.04 | 28.27 |
December | 214.45 | 4.35 |
In overall, the seeders, leechers, and peers percentage has decreased significantly compared to the September measurement. One likely explanation for this change is that the popular torrents recorded in this experiment have a lower standard deviation of the reported values of health information (new version of Tribler) compared to that of the measurements taken in September. That is, more diverse popular torrents are being distributed in the new version of Tribler 7.12.1. This is different from the earlier Tribler version where we observed a few popular torrents were distributed multiple times.
Lessons learned: Even though the torrent is reported to be alive and popular, it could still be dead as we found out by checking. This gap between reported and checked requires fixing the checking mechanism within Tribler.
In my opinion, the dissemination of popular torrents via the popularity community is satisfactory looking at the results.
In overall, the seeders, leechers, and peers percentage has decreased significantly compared to the September measurement.
I would point out another reason - lower number of users using Tribler might skew the results or atleast give an erratic response.
I do not know why but it seems that userbase has decreased quite a lot.
For newer torrents, I get downloading/uploading speeds of around 20MBPS in qBitTorrent (without VPN) but on Tribler I hardly cross maximum of 4MBPS (without hops).
Is this because of low number of users or unable to connect to peers or cooperative downloading - that I have no technical knowledge of?
@absolutep Interesting thought, thx! We need to measure that and compensate for that.
@xoriole The final goal of this work is to either write or contribute the technical content to a (technical/scientific) paper, like: https://github.com/Tribler/tribler/files/10186800/LTR_Thesis_v1.1.pdf
We're very much not ready for machine learning. But for publication results its strangely easy to mix measurement of a 17 years deployed system with simplistic Python Jupyter notebooks with machine learning. Key performance indicator: zombies in top-N (1000). Agree with key point you raised: stepping out of the engineering mindset. Basically we're spreading data nicely and fast, its only a bit wrong (e.g. 296.44% :joy: )
Lesson learned: started simple, working, and inaccurate. Evolved complexity: we need a filter step and measure again later in time (e.g. re-measure, re-confirm popularity). Reactive, pro-active, or emergent design. Zero trust architecture: trust nobody but yourself. We have no idea actually. So just build, deploy, and watch what happens. Actually we need to know the root cause of failure. Without understanding the reason for wrong statistics, we're getting nowhere. Can we reproduce the BEP33 error, for instance? Therefore, analysis of 1 month system dynamics and faults. Scientific related work (small sample from this blog on Google Youtube):
Scientific problem is item ranking. What would be interesting to know is: how fast does the frontpage of Youtube change with the most-popular videos? Scientific article by Google: Deep Neural Networks for YouTube Recommendations.
Discussed progress, next sprint: how good is are the popularity statistics with latest 12.1 Tribler (filtered results, compared to ground truth)? DHT self-attack issue to investigate next?
Comparing the results from the naked libtorrent and the Tribler, I found that the results of the torrent check of the popular torrents received via the popularity community when checked locally results in dead torrents which is likely not the case. This is because of the issue in torrent checker (DHT Session checker). After BEP33 is removed, the earlier way of getting the health response mostly returns in zero seeders and zero or some leechers, this in the UI shows as
Could this bug (https://github.com/Tribler/tribler/issues/6131) relate to the described issues?
Could this bug (#6131) relate to the described issues?
Yes, it is same bug
While working on https://github.com/Tribler/tribler/pull/7286 I've found a strange behavior that may shed light on some of the other oddities.
If TorrentChecker
performs a check via a tracker, then returned values always look ok-ish (like 'seeders': 10, 'leechers': 77
).
If TorrentChecker
performs a check via DHT, then returned seeders are always equal to 0 (like 'seeders': 0, 'leechers': 56
)
Maybe it is a bug that @xoriole describes above.
UPDATED 03.02.22 after verification from @kozlovsky
I also found that one automatic check in TorrentChecker
was broken.
I also have found that literally all automatic checks in TorrentChecker
were broken.
There are three automatic checks: https://github.com/Tribler/tribler/blob/87916f705eb7e52da828a14496b02db8d61ed5e9/src/tribler/core/components/torrent_checker/torrent_checker/torrent_checker.py#L72-L75
The first (check_random_tracker
) is broken because it performs the check, but didn't save the results into DB:
The second (check_local_torrents
) is broken because it calls an async function in a sync way (which doesn't leads to the execution of the called function).
The third (check_torrents_in_user_channel
) is also broken because it calls an async function in a sync way (which doesn't leads to the execution of the called function).
CC: @kozlovsky
Also, I'm posting an algorithm example of getting seeders' and leechers' in case there is more than one source of information available.
TorrentChecker
checks the seeders' and leechers' for an infohash.TorrentChecker
sends a DHT request and a request to a tracker.TorrentChecker
receives two answers. One from DHT and one from the tracker:
DHT_response= {"seeders": 10, "leechers"=23}
tracker_response={"seeders": 4, "leechers"=37}
)TorrentChecker
picks the answer with the maximum seeders' value. Therefore the result is:
result={"seeders": 10, "leechers"=23}
TorrentChecker
saves this information to the DB (and propagates it through PopularityCommunity
later).Intuitively it is not the correct algorithm. Maybe we should use the mean
function instead of the max
.
Something like:
from statistics import mean
DHT_response = {'seeders': 10, 'leechers': 23}
tracker_response = {'seeders': 4, 'leechers': 37}
result = {'seeders': None, 'leechers': None}
for key in result.keys():
result[key] = mean({DHT_response[key], tracker_response[key]})
print(result) # {'seeders': 7, 'leechers': 30}
Or we might prioritize the sources. Let's say:
@arvidn indicated: tracking popularity is known to be a hard problem.
We deployed the first version into Tribler #3649 , after prior Master thesis research #2783. However, we lack documentation or specification of the deployed protocol.
Key research questions:
Concrete graphs from a single crawl:
Implementation of
on_torrent_health_response(self, source_address, data)
ToDo @xoriole : document deployed algorithm in 20+ lines (swarm check algorithm, pub/sub, hash selection algorithm,handshakes, search integration, etc.).