drandreaskrueger commented 6 years ago

IBFT seems to max out around 200 TPS when run in the 7 nodes example.

--> see these results

However, the original publication is talking about 800 TPS with Istanbul BFT. How did they do it?

Any ideas how to get this faster?

Thanks!

drandreaskrueger commented 6 years ago

New benchmark. >400 TPS!

On a dockerized crux-quorum with 4 nodes.

Surprise: web3 turned out to be a huge bottleneck now!
When not using web3 transaction calls but direct RPC calls, I see considerable TPS rate improvements (from today's previous record ~273 TPS):

https://gitlab.com/electronDLT/chainhammer/blob/master/quorum-IBFT.md#direct-rpc-call-instead-of-web3-call

initially

over 450 TPS !!!

(but only during the first ~14,000 transactions, then it drops to ~270 TPS, mysteriously. Any ideas, anyone?)

diagrams: https://gitlab.com/electronDLT/chainhammer/blob/master/chainreader/img/istanbul-crux-docker-1s-gas20mio-RPC_run8_tps-bt-bs-gas_blks28-93.png

fixanoid commented 6 years ago

Hey @drandreaskrueger looks great. As to the drop off at 14k txns, since you are already tinkering with the cli options for geth, please look into these as well:

PERFORMANCE TUNING OPTIONS:
  --cache value            Megabytes of memory allocated to internal caching (default: 1024)
  --cache.database value   Percentage of cache memory allowance to use for database io (default: 75)
  --cache.gc value         Percentage of cache memory allowance to use for trie pruning (default: 25)
  --trie-cache-gens value  Number of trie node generations to keep in memory (default: 120)

These are from: https://github.com/ethereum/go-ethereum/wiki/Command-Line-Options. Also, for the report, might be good to also keep track of queued txns.

drandreaskrueger commented 6 years ago

Thanks a lot.

I have now tried

--cache 4096 --trie-cache-gens 1000

but no change in behavior. Sudden TPS drop around 14k transactions, look at TPS_current:

block 108 | new #TX 415 / 1000 ms = 415.0 TPS_current | total: #TX 9503 / 22.4 s = 424.9 TPS_average
block 109 | new #TX 437 / 1000 ms = 437.0 TPS_current | total: #TX 9940 / 23.3 s = 426.4 TPS_average
block 110 | new #TX 516 / 1000 ms = 516.0 TPS_current | total: #TX 10456 / 24.6 s = 425.7 TPS_average
block 111 | new #TX 509 / 1000 ms = 509.0 TPS_current | total: #TX 10965 / 25.2 s = 434.6 TPS_average
block 112 | new #TX 411 / 1000 ms = 411.0 TPS_current | total: #TX 11376 / 26.2 s = 434.3 TPS_average
block 113 | new #TX 480 / 1000 ms = 480.0 TPS_current | total: #TX 11856 / 27.4 s = 432.0 TPS_average
block 114 | new #TX 509 / 1000 ms = 509.0 TPS_current | total: #TX 12365 / 28.4 s = 435.4 TPS_average
block 115 | new #TX 381 / 1000 ms = 381.0 TPS_current | total: #TX 12746 / 29.1 s = 438.7 TPS_average
block 116 | new #TX 411 / 1000 ms = 411.0 TPS_current | total: #TX 13157 / 30.3 s = 434.3 TPS_average
block 117 | new #TX 482 / 1000 ms = 482.0 TPS_current | total: #TX 13639 / 31.3 s = 436.1 TPS_average
block 118 | new #TX 507 / 1000 ms = 507.0 TPS_current | total: #TX 14146 / 32.5 s = 434.7 TPS_average
block 119 | new #TX 250 / 1000 ms = 250.0 TPS_current | total: #TX 14396 / 33.2 s = 433.7 TPS_average
block 120 | new #TX 211 / 1000 ms = 211.0 TPS_current | total: #TX 14607 / 34.1 s = 427.9 TPS_average
block 121 | new #TX 282 / 1000 ms = 282.0 TPS_current | total: #TX 14889 / 35.4 s = 420.8 TPS_average
block 122 | new #TX 288 / 1000 ms = 288.0 TPS_current | total: #TX 15177 / 36.3 s = 417.7 TPS_average
block 123 | new #TX 294 / 1000 ms = 294.0 TPS_current | total: #TX 15471 / 37.0 s = 418.1 TPS_average
block 124 | new #TX 280 / 1000 ms = 280.0 TPS_current | total: #TX 15751 / 38.3 s = 411.6 TPS_average
block 125 | new #TX 256 / 1000 ms = 256.0 TPS_current | total: #TX 16007 / 39.2 s = 408.1 TPS_average
block 126 | new #TX 251 / 1000 ms = 251.0 TPS_current | total: #TX 16258 / 40.2 s = 404.4 TPS_average
block 127 | new #TX 282 / 1000 ms = 282.0 TPS_current | total: #TX 16540 / 41.2 s = 401.7 TPS_average
block 128 | new #TX 288 / 1000 ms = 288.0 TPS_current | total: #TX 16828 / 42.4 s = 396.6 TPS_average
block 129 | new #TX 220 / 1000 ms = 220.0 TPS_current | total: #TX 17048 / 43.4 s = 393.1 TPS_average
block 130 | new #TX 277 / 1000 ms = 277.0 TPS_current | total: #TX 17325 / 44.3 s = 391.0 TPS_average

drandreaskrueger commented 6 years ago

same observation also in geth v1.8.13 (not only in quorum)

https://github.com/ethereum/go-ethereum/issues/17447

drandreaskrueger commented 6 years ago

any new ideas about that?

drandreaskrueger commented 6 years ago

You can now super-easily reproduce my results, in less than 10 minutes, with my Amazon AMI image:

https://gitlab.com/electronDLT/chainhammer/blob/master/reproduce.md#readymade-amazon-ami

drandreaskrueger commented 6 years ago

any new ideas about that?

vasa-develop commented 5 years ago

@drandreaskrueger @fixanoid Any updates on why the TPS drop occurs around 14K?

Thanks :)

vasa-develop commented 5 years ago

@drandreaskrueger Is this result for AWS consistent, or it was a one-time feat? peak TPS_average is 536 TPS, final TPS_average is 524 TPS.

drandreaskrueger commented 5 years ago

Last time I checked, the problem was still there.

But it seems to be caused upstream, because look at this:

https://github.com/ethereum/go-ethereum/issues/17447#issuecomment-431629285

It happens in geth too!

drandreaskrueger commented 5 years ago

Perhaps you can help them to find the cause?

jpmsam commented 5 years ago

That's a good idea. We'll look into it too after the upgrade to 1.8.18.

drandreaskrueger commented 5 years ago

Cool, thanks.

There will soon be a whole new version of chainhammer, with much more automation.

Stay tuned ;-)

vasa-develop commented 5 years ago

@drandreaskrueger is the AWS result with the web3 lib? Did you try with direct RPC calls(as you mentioned that web3 causes a lot of damage to the TPS)? If not I will give it a try.

drandreaskrueger commented 5 years ago

I had tried both, via web3 and via direct RPC calls. The latter was usually faster, so I have done all later measurements with RPC calls.

The old code is still there though, and the switch is here, so you can simply try yourself: https://github.com/drandreaskrueger/chainhammer/blob/223fda085aad53c1cbf4c46c336ad04c2348da82/hammer/config.py#L40-L41

You can also read this: https://github.com/drandreaskrueger/chainhammer/blob/master/docs/FAQ.md

it links into the relevant code pieces

drandreaskrueger commented 5 years ago

@jpmsam

after the upgrade to 1.8.18.

Oh, oops - I have been missing a lot then. But why v1.8.18 - your release page talks about 2.2.1?

Still doing all my benchmarks with a Quorum version that calls itself Geth/v1.7.2-stable-d7e3ff5b/linux-amd64/go1.10.1 ...

... because I am benchmarking quorum via the excellent dockerized 4 nodes setup created by blk-io, see here which is less heavy than your vagrant virtualbox 7 nodes setup. I suggest you have a look at that dockerized version, perhaps you can publish something similar. Or do you have a dockerized Quorum setup by now?

For all my benchmarking, I could find dockerized versions of Geth, Parity, and Quorum - and blk-io/crux is the one I am using for quorum.

drandreaskrueger commented 5 years ago

chainhammer v55

I have just published a brand new version v55: https://github.com/drandreaskrueger/chainhammer/#quickstart

Instead of installing everything to your main work computer, better use (a virtualbox Debian/Ubuntu installation or) my Amazon AMI to spin up a t2.medium machine, see docs/cloud.md#readymade-amazon-ami.

Then all you need to do is:

networks/quorum-configure.sh
CH_TXS=50000 CH_THREADING="threaded2 20" ./run.sh "YourNaming-Quorum" quorum

and afterwards check results/runs/ to find an autogenerated results page, with time series diagrams.

Hope that helps! Keep me posted please.

jio-gl commented 4 years ago

Looks great. What is the performance with 100 nodes?

drandreaskrueger commented 4 years ago

What is the performance with 100 nodes?

Just try it out.

I am importing the /blk-io_crux/docker/quorum-crux project here: https://github.com/drandreaskrueger/chainhammer/blob/49a7d78543b9f26e9839286c7f8c73851a18ca52/networks/quorum-configure.sh#L3-L12

If you look into their details, extending this from 4 nodes to 100 nodes looks doable, just tedious: https://github.com/blk-io/crux/blob/eeb63a91b7eda0180c8686f819c0dd29c0bc4d46/docker/quorum-crux/docker-compose-local.yaml

It would have to be a very large machine. And I would not expect huge changes. This type of distributed ledger technology doesn't get faster by plugging in more nodes, no?

varasev commented 4 years ago

Has anyone tried to test the latest geth version? https://www.reddit.com/r/ethereum/comments/fqk8vm/transaction_propagation_optimization_in_geth_1911/

Consensys / quorum

Sudden drop in TPS around 14k transactions (Quorum IBFT) #479

New benchmark. >400 TPS!

chainhammer v55