Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.77k stars 443 forks source link

Trustchain scalability and stess testing experiment #4140

Closed synctext closed 3 weeks ago

synctext commented 5 years ago

Having a Trustchain database with infinite growth is a problem.

task: build a Trustchain block collector and automatic generator

Trustchain is becoming a key performance bottleneck, as our network is growing to 20k concurrent users. We are experiencing the limits of our performance. Overlap with prior issue: https://github.com/Tribler/tribler/issues/3861

Solution: dedicated experiment with 1 real full Tribler instance and an emulated "mocked" network of thousands or even over a million peers. All communication with the network will result in a response from a single generator that generates IPv8-based and Trustchain-compliant return traffic. The generator of an infinite amount of faked Trustchain records is then used for performance analysis. These Trustchain records have valid binary content, but the data is fake. The Tribler Trustchain community can be tested and we will able to conduct a performance analysis.

Desired outcome Repeatable performance numbers for a nightly job and performance regression analysis. For instance, after collecting 1 million Trustchain records the read performance for a random Trustchain lookup becomes 500 ms and writing a newly discovered Trustchain record becomes 300 ms.

Background reading: thesis using the same technique plus .PDF file Source code

grimadas commented 5 years ago

Please assign me to this task

qstokkink commented 5 years ago

@grimadas you first need to accept the team invitation, before we can assign you to issues.

grimadas commented 5 years ago

Progress so far:

image

Plans for the next week:

synctext commented 5 years ago

Latest numbers: 1 million peers databases, 10 trustchain records per peer and 21 KByte per Trustchain record = 210 GByte. Move it to Gumby+Jenkins possibly, see existing trustchain crawler?

grimadas commented 5 years ago

Update:

The number i mentioned before is the size of a block as a Python object - 21 KB.
The half-block in serialised format takes only 290 bytes.

Database testing

I tested on the scalability of the database (sqlite). Whole database is stored in two files WAL and db. All transactions first go through write-ahead log, that grows linearly with 60 KB per transaction(see figure).
image

After collecting 9 MB sqlite initiates vacuum and flushes to db file. The db file is growing linearly with with each block adding 500 bytes.

image

image

image

grimadas commented 5 years ago

We can now try our stress testing and scalability on DAS5 and Jenkins.

For example, 9 peers sending 100 blocks per sec to one peer(let's say leader node). https://jenkins-ci.tribler.org/job/pers/job/validation_experiment_trustchain_bulat/44/ See attached histogram of blocks arrived at leader node. image image

devos50 commented 5 years ago

Cool!

synctext commented 5 years ago

4471 Overlap.

Possible new 1st year thesis direction: heavy benchmarking of Trustchain and compare to Holochain, blockbench, and the seminal PeerReview publication. Input for a Usenix submission. We have freeriding detection that scales.

synctext commented 5 years ago

Storing Trustchain records is rational and incentive compatible: prevent being defrauded by malicious actors.

synctext commented 4 years ago

Update....We need performance understanding, "fault attribution", and "fault resolution". Fault is a neutral term which covers both malicious and non-malicious (e.g. Byzantine failure) violations of the protocol, possibly resulting in double spending. Scalability of fault detection, fault attribution and fault resolution is a worthy thesis chapter. Heavy benchmarking of Trustchain before X-Mas 2019 is required for Usenix.

fault attribution : presenting evidence that could be used to umambiguously convince any observer which actor caused the protocol fault

devos50 commented 4 years ago

I already have some experiments around extensive double-spend detection, which shows that if everyone crawls the network and requests chains of others, even a single double-spend can be detected within seconds. I included this graph in the (rejected) workshop submission of the market paper. The experiment should still be around on Github.

qstokkink commented 3 weeks ago

Trustchain has been removed and, therefore, this issue is no longer relevant.