Sifchain / sifnode

SifNode - The future of Defi
Other
109 stars 118 forks source link

sifnoded RPC denial of service for UI requests #2758

Open jzvikart opened 2 years ago

jzvikart commented 2 years ago

Background

This issue described here was originally described in ticket #1460, but was later separated and moved to its own ticket when we discovered that it's about a different scenario. Nevertheless, we should keep in mind that both issues might be somehow connected and that one might benefit from solutions/fixes for the other. Anybody working on this ticket should also be familiar with the comments on #1460 to get a better sense of understanding this issue.

The problem

@gzukel found out that a sifnoded (v0.13.1 at the time) would sporadically return errors over RPC queries that are called by UI. He was able to reproduce the problem by running 4 static RPC queries (HTTP POST for GetLiquidityProviderData, GetRewardParams, GetPools and GetPmtpParams). Those queries do not change any state and are supposed to return a valid result at any time. However, if they are called in rapid succession (e.g 100 times) by a number of parallel threads (e.g. 50), the queries start to return HTTP errors.

With a light load (i.e. 1 thread, 5s sleep time between requests) we did not see any errors. The goal of any investigation should therefore be focused on finding the root cause.

What we observed

At the same time when the RPC endpoint start returning errors we see a significant increase of these messages in the sifnoded logs:

8:35AM INF Dialing peer address={"id":"fdaa88f2a0bacd93590d6ce8f0a9e584ec306afc","ip":"62.133.229.14","port":36656} module=p2p
8:35AM INF Dialing peer address={"id":"3e3307fe457940a8f5a3a4315401f55fe6c016db","ip":"18.211.58.165","port":26656} module=p2p
8:35AM ERR dialing failed (attempts: 1): dial tcp 62.133.229.14:36656: connect: connection refused addr={"id":"fdaa88f2a0bacd93590d6ce8f0a9e584ec306afc","ip":"62.133.229.14","port":36656} module=pex
8:35AM INF Starting Peer service impl="Peer{MConn{178.63.44.171:26656} 30f2c8299d132d8b10b07b85da6a97271e61bfe0 out}" module=p2p peer={"id":"30f2c8299d132d8b10b07b85da6a97271e61bfe0","ip":"178.63.44.171","port":26656}
8:35AM INF Starting MConnection service impl=MConn{178.63.44.171:26656} module=p2p peer={"id":"30f2c8299d132d8b10b07b85da6a97271e61bfe0","ip":"178.63.44.171","port":26656}
8:35AM ERR dialing failed (attempts: 1): dial tcp 18.211.58.165:26656: i/o timeout addr={"id":"3e3307fe457940a8f5a3a4315401f55fe6c016db","ip":"18.211.58.165","port":26656} module=pex
8:36AM INF minted coins from module account amount=112403081268641695195rowan from=mint module=x/bank
8:36AM INF Timed out dur=3000 height=6764540 module=consensus round=0 step=3
8:36AM INF minted coins from module account amount=225000000000000000000rowan from=dispensation module=x/bank
8:37AM ERR failed to write responses err="write tcp 172.31.26.58:26657->172.31.28.185:37288: i/o timeout" module=rpc-server res=[{"id":125454479185,"jsonrpc":"2.0","result":{"response":{"code":0,"codespace":"","height":"6764542","index":"0","info":"","key":null,"log":"","proofOps":null,"value":"..."}}}]

It should be noted that some of these errors (in particular connection refused) are part of normal/expected behaviour, but the increased frequency shows that there is a strong correlation with the problem caused by test load.

Other than that, we did not see any characteristic error messages in sifnoded logs.

Next things to do

How to get the test load script

For the time being I did not commit test load script to any public repository due to the risk of abuse. The original script can be obtained from ChainOps, whereas an slightly improved version (with command-line parametrizations of URL) is also available on request from @jzvikart.

jzvikart commented 2 years ago

@pandaring2you Please reassign.