axieinfinity / ronin

A DPoS blockchain.
GNU Lesser General Public License v3.0
67 stars 30 forks source link

High Latency when doing batch calls #442

Closed harshsingh-cs closed 1 month ago

harshsingh-cs commented 5 months ago

I am currently using ethereum-etl tool to do some backfilling but I have observed some high latency when doing so. Mainly all the requests are batch requests. There is no visible bottleneck on the Disk, CPU or RAM but still the response time from the node is significantly high image

Here is my startup command

       - ronin
        - --config=/root/geth-config/config.toml
        - --rpc.evmtimeout=5s
        - --cache=9830
        - --networkid=2020
        - --port=30303
        - --syncmode=full
        - --maxpeers=100
        - --txlookuplimit=2350000
        - --txpool.pricebump=10
        - --txpool.pricelimit=20000000000
        - --txpool.nolocals
        - --http
        - --http.addr=0.0.0.0
        - --http.port=8545
        - --http.vhosts=*
        - --http.corsdomain=null
        - --http.api=net,eth,web3
        - --ws
        - --ws.addr=0.0.0.0
        - --ws.port=8546
        - --ws.api=net,eth,web3
        - --ws.origins=*
        - --rpc.txfeecap=100
        - --rpc.gascap=500000000
        - --metrics
        - --metrics.addr=0.0.0.0
        - --metrics.port=6060
        - --bootnodes=..
        image: ghcr.io/axieinfinity/ronin:v2.7.1-c64ddd3

config.toml

[Eth]
TrieTimeout = 3600000000000

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000

Can you please suggest what is causing this and what can I improve

minh-bq commented 5 months ago

What is the command that you use to run export_blocks_and_transactions.py so that we can try to reproduce on our side?

harshsingh-cs commented 5 months ago

@minh-bq you can use this

ethereumetl export_blocks_and_transactions --start-block 34010000 --end-block 34090000 --provider-uri <node url> --blocks-output blocks-ronin.json --transactions-output txns-ronin.csv
minh-bq commented 5 months ago

I've tried in our node and it worked just fine, the latency was quite low image

harshsingh-cs commented 5 months ago

@minh-bq can you share the output of ethereum-etl specifically how much time it took to complete the blocks For example

root@ethereum-etl# ethereumetl export_blocks_and_transactions --start-block 34010000 --end-block 34090000 --provider-uri https://<node url> --blocks-output ..//blocks-ronin.json --transactions-output ../txns-ronin.csv
2024-05-13 11:16:54,476 - ProgressLogger [INFO] - Started work. Items to process: 80001.
2024-05-13 11:17:28,834 - ProgressLogger [INFO] - 8100 items processed. Progress is 10%.
2024-05-13 11:17:53,816 - ProgressLogger [INFO] - 16100 items processed. Progress is 20%.
2024-05-13 11:19:04,945 - ProgressLogger [INFO] - 24100 items processed. Progress is 30%.
2024-05-13 11:20:16,429 - ProgressLogger [INFO] - 32100 items processed. Progress is 40%.
2024-05-13 11:20:48,081 - ProgressLogger [INFO] - 40100 items processed. Progress is 50%.
2024-05-13 11:21:10,837 - ProgressLogger [INFO] - 48100 items processed. Progress is 60%.
2024-05-13 11:22:42,877 - ProgressLogger [INFO] - 56100 items processed. Progress is 70%.
2024-05-13 11:23:20,189 - ProgressLogger [INFO] - 64100 items processed. Progress is 80%.
2024-05-13 11:23:44,012 - ProgressLogger [INFO] - 72100 items processed. Progress is 90%.
2024-05-13 11:24:31,089 - ProgressLogger [INFO] - 80001 items processed. Progress is 100%.
2024-05-13 11:24:31,090 - ProgressLogger [INFO] - Finished work. Total items processed: 80001. Took 0:07:36.613566.
2024-05-13 11:24:31,090 - CompositeItemExporter [INFO] - block items exported: 80001
2024-05-13 11:24:31,090 - CompositeItemExporter [INFO] - transaction items exported: 4862373

You can see in the above output that it prints the total time taken Finished work. Total items processed: 80001. Took 0:07:36.613566.

minh-bq commented 5 months ago

Okay, I understand the problem now. I see very high CPU usage (> 80%) on the node that I connect when running the above command. This is a thing we can look into for improvement but I think it is not unusual. With the above command, ethereumetl will run 5 workers and batch 100 rpc requests per HTTP request. The command calls eth_getBlockByNumber and includes full transaction information, this consumes a lot of CPU to recover the sender of transactions (at the block range, I see there are a lot of transactions).

I think you try to increase the number of CPUs on the node, use -b to configure the batch size, -w to configure the number of workers when running ethereumetl. More workers means more parallel HTTP requests and AFAIK HTTP requests are handled parallel (however, you need to make sure that the node has enough number of CPUs). The RPC requests in the same batch HTTP request are executed sequentially.

harshsingh-cs commented 5 months ago

@minh-bq we also see the increase in CPU but not so much so that it would cause the reqs to slow down. Can you share the total time it took to process all the blocks

minh-bq commented 5 months ago
2024-05-21 07:02:25,756 - ProgressLogger [INFO] - Started work. Items to process: 80001.
2024-05-21 07:03:10,625 - ProgressLogger [INFO] - 8100 items processed. Progress is 10%.
2024-05-21 07:03:41,731 - ProgressLogger [INFO] - 16100 items processed. Progress is 20%.
2024-05-21 07:05:20,431 - ProgressLogger [INFO] - 24100 items processed. Progress is 30%.
2024-05-21 07:06:36,485 - ProgressLogger [INFO] - 32100 items processed. Progress is 40%.
2024-05-21 07:07:11,415 - ProgressLogger [INFO] - 40100 items processed. Progress is 50%.
2024-05-21 07:07:35,074 - ProgressLogger [INFO] - 48100 items processed. Progress is 60%.
2024-05-21 07:09:48,521 - ProgressLogger [INFO] - 56100 items processed. Progress is 70%.
2024-05-21 07:10:34,355 - ProgressLogger [INFO] - 64100 items processed. Progress is 80%.
2024-05-21 07:11:02,096 - ProgressLogger [INFO] - 72100 items processed. Progress is 90%.
2024-05-21 07:12:00,831 - ProgressLogger [INFO] - 80001 items processed. Progress is 100%.
2024-05-21 07:12:00,838 - ProgressLogger [INFO] - Finished work. Total items processed: 80001. Took 0:09:35.082243.
2024-05-21 07:12:00,839 - CompositeItemExporter [INFO] - block items exported: 80001
2024-05-21 07:12:01,180 - CompositeItemExporter [INFO] - transaction items exported: 4862373

I get the same amount of time. I've just tried on a more CPUs machine and seems like CPU is underutilized.

minh-bq commented 5 months ago
diff --git a/internal/ethapi/api.go b/internal/ethapi/api.go
index 6738b627a..79d93e955 100644
--- a/internal/ethapi/api.go
+++ b/internal/ethapi/api.go
@@ -21,6 +21,8 @@ import (
    "errors"
    "fmt"
    "math/big"
+   "runtime"
+   "sync"
    "time"

    "github.com/davecgh/go-spew/spew"
@@ -52,9 +54,16 @@ import (

 var ErrMethodNotSupport = errors.New("method is not supported")

-// estimateGasErrorRatio is the amount of overestimation eth_estimateGas is
-// allowed to produce in order to speed up calculations.
-const estimateGasErrorRatio = 0.015
+const (
+   // estimateGasErrorRatio is the amount of overestimation eth_estimateGas is
+   // allowed to produce in order to speed up calculations.
+   estimateGasErrorRatio = 0.015
+
+   // minBlockTransactionThreshold is the minimum number of transactions in block
+   // that we start more go routines to marshal transaction in GetBlockByNumber
+   // when full transaction is requested
+   minBlockTransactionThreshold = 20
+)

 // PublicEthereumAPI provides an API to access Ethereum related information.
 // It offers only methods that operate on public data that is freely available to anyone.
@@ -1257,10 +1266,31 @@ func RPCMarshalBlock(block *types.Block, inclTx bool, fullTx bool, config *param
        }
        txs := block.Transactions()
        transactions := make([]interface{}, len(txs))
-       var err error
-       for i, tx := range txs {
-           if transactions[i], err = formatTx(tx); err != nil {
-               return nil, err
+
+       if fullTx && len(txs) > minBlockTransactionThreshold {
+           numRoutines := runtime.NumCPU()
+           numTxPerRoutine := (len(txs) + numRoutines - 1) / numRoutines
+
+           var wg sync.WaitGroup
+           wg.Add(numRoutines)
+           for i := 0; i < numRoutines; i++ {
+               go func(start int) {
+                   defer wg.Done()
+
+                   for j := start; j < start+numTxPerRoutine && j < len(txs); j++ {
+                       transactions[j], _ = formatTx(txs[j])
+                   }
+
+               }(i * numTxPerRoutine)
+           }
+
+           wg.Wait()
+       } else {
+           var err error
+           for i, tx := range txs {
+               if transactions[i], err = formatTx(tx); err != nil {
+                   return nil, err
+               }
            }
        }
        fields["transactions"] = transactions

Can you try this patch to see if there is improvement (it is not well tested though, so please cross check the data if you want to use it)? In my opinion, ethereumetl might do better to improve the overall time. During testing, I cannot make the RPS exceed 300. From our chart, I saw that RPS is not stable and there are drops in the middle of exporting. This leads to the node's CPU is not fully utilized.

minh-bq commented 1 month ago

Close due to inactivity, please feel free to reopen for further discussion.