Open synctext opened 8 years ago
Related to multichain crawling. We don't want to spy on our users for profit, but identify fault, failures, and points for improvements. Respect privacy, no exposure of any individual, and only provide insight into the global system behavior. #2532 #1429
http://statistics.tribler.org/ is back with IPv8 showing user communities, we just need longer term statistics now.
A 2024 update: we now have multiple crawlers but they do not meet the original goal of OP. They are semi-validated, not documented, and not reliable. Frankly, we have too many crawlers: I have a hard time remembering what we even have running.
Goal: a validated, documented and reliable crawler to understand user behavior. This enables the future step of measuring behavioral change.
We have an existing Crawler for Dispersy communities and Tribler. The general Tribler crawler stopped being updated in 2013. See: http://Statistics.tribler.org This is annotated with our releases and major news events. However, totally unmaintained and difficult to maintain.
This crawler needs to move to a proxmox machine and improved. Improved insight will help us understand the network health and roadmap.
Expected results: real-time daily graphs of Tribler network size:
User upgrade behavior:
Examples taken from: http://crawler.doxu.org/uptimes.html
ToDo: NAT type as reported by Dispersy in our community and evolution in time.
According this Github downloads stats we have 302000 downloads of Tribler. http://www.somsubhra.com/github-release-stats/?username=tribler&repository=tribler However, our non-validated, many-years-old crawler only sees a few thousand users.
The thesis of Niels contains an extensive user community evaluation and "data science" portion. http://www.tribler.org/SimilarityFunction/ Thesis.pdf: http://kayapo.tribler.org/trac/raw-attachment/wiki/SimilarityFunction/thesis.pdf
Current setup: Kayapo web space: /var/www/statistics.tribler.org/htdocs/img/ Soft links to: /home/tribler/generate-periodic-statistics kayapo:/home/tribler/generate-periodic-statistics# wc -l *.py 193 first_last.py 191 parse.py 169 reduce.py 553 total
Some crawlers have died a few years ago: