Closed meltingice closed 6 years ago
Hi,
thanks for this issue. I've actually been thinking about that a while ago. By browsing through the node's code, I came to the following conclusion - which might not be fully correct though: Each RPC request that is received (rpc.cpp - rai::rpc_connection::read ()) is put in a "boost::asio" queue through the post() call in "node->background()". The queue is processed sequentially, one RPC request after the other. If the node is hit with a huge number of RPC calls per second, the overall time to complete all requests is becoming larger, but the actual load does not increase substantially. One thing I could not find out is, how big this queue can actually become and whether/how limits are enforced.
I've been hitting the node with as many parallel RPC requests as possible (via localhost) and the node did not bat an eye and was happily checking blocks and voting in between the RPC calls. Same, when going through the webserver. The log files indicate the queing:
[2018-04-11 20:24:05.488941]: RPC request 0x7e60880e9fb0 completed in: 168 microseconds
[2018-04-11 20:24:05.492857]: {"action": "version"}
[2018-04-11 20:24:05.492953]: RPC request 0x7e6078180760 completed in: 122 microseconds
[2018-04-11 20:24:05.509001]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.509158]: RPC request 0x7e6078140df0 completed in: 195 microseconds
[2018-04-11 20:24:05.553504]: {"action": "version"}
[2018-04-11 20:24:05.553638]: RPC request 0x7e60880e9fb0 completed in: 162 microseconds
[2018-04-11 20:24:05.585310]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.585456]: RPC request 0x7e607c161ef0 completed in: 177 microseconds
[2018-04-11 20:24:05.618750]: {"action": "version"}
[2018-04-11 20:24:05.618932]: RPC request 0x7e607c1620a0 completed in: 217 microseconds
[2018-04-11 20:24:05.630146]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.630298]: RPC request 0x7e6078180760 completed in: 196 microseconds
[2018-04-11 20:24:05.651782]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.651935]: RPC request 0x7e60880e9fb0 completed in: 184 microseconds
[2018-04-11 20:24:05.652030]: {"action": "version"}
[2018-04-11 20:24:05.652112]: RPC request 0x7e6088107b60 completed in: 95 microseconds
[2018-04-11 20:24:05.714345]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.714516]: RPC request 0x7e6078180760 completed in: 200 microseconds
[2018-04-11 20:24:05.714709]: {"action": "version"}
[2018-04-11 20:24:05.714781]: RPC request 0x7e6078140df0 completed in: 87 microseconds
[2018-04-11 20:24:05.714896]: {"action": "version"}
[2018-04-11 20:24:05.714967]: RPC request 0x7e6078131c30 completed in: 94 microseconds
[2018-04-11 20:24:05.743345]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.743470]: RPC request 0x7e60880e9fb0 completed in: 158 microseconds
[2018-04-11 20:24:05.758259]: Block 5613CB3EF7DDEC27C703468952077AA7825A48DC93F3AF3FA57FFB143215E1A3 was confirmed to peers
[2018-04-11 20:24:05.759204]: Block C08B9A6D896AD359999601D2652D49EB8C466755F675D8F9AEA66CAF3762852F was confirmed to peers
[2018-04-11 20:24:05.759848]: Block 155FC0C47CB5F21B550FC109375217620151949BF2266258F29A9C0D979C34D2 was confirmed to peers
[2018-04-11 20:24:05.760502]: Block C401E497585725A63512C477D7C287A77825799F3BAA34C59AAECD2021C223BD was confirmed to peers
[2018-04-11 20:24:05.777335]: Block A12C21EE38F36B18BA2A3974D22AB31ED86124BDE7F15C992CAD9EA825BC5DF2 was confirmed to peers
[2018-04-11 20:24:05.779621]: Block 0589A3198C301049D9E454323309064B7724F400A0FBD8091AD879D34ACF7078 was confirmed to peers
[2018-04-11 20:24:05.780344]: Block 1EAC8BBA1392149AFBE5A2AC507E43EFF701B90E618E952C7F4ACBFB0BF9D150 was confirmed to peers
[2018-04-11 20:24:05.781237]: Block 2E97CD3582282F24621F061BE9A91F7BBFFE0F4EBA07A39E4C9F9195CF3FC477 was confirmed to peers
[2018-04-11 20:24:05.782012]: Block 852BB6BE276E9E5B6FE090734551B55BC5FA88EAD6162F1C5463FAF85E0F0D13 was confirmed to peers
[2018-04-11 20:24:05.782849]: {"action": "version"}
[2018-04-11 20:24:05.782986]: RPC request 0x7e607401da00 completed in: 158 microseconds
[2018-04-11 20:24:05.783030]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.783153]: RPC request 0x7e60740ccdf0 completed in: 139 microseconds
[2018-04-11 20:24:05.784591]: Block 02A5AE45B92D3EE6F17A415B0DE5D052B1DA2564518687E094C15E7867117BF4 was confirmed to peers
[2018-04-11 20:24:05.786822]: Block 80CB0EC69DFCE2AFCB59B1D3D0B7D52CA14BFC2F3D01C0E6547619B35A7FEEBF was confirmed to peers
[2018-04-11 20:24:05.811366]: {"action": "version"}
[2018-04-11 20:24:05.811500]: RPC request 0x7e607415a8e0 completed in: 164 microseconds
[2018-04-11 20:24:05.811825]: {"action": "account_info", "account": "xrb_1f56swb9qtpy3yoxiscq9799nerek153w43yjc9atoaeg3e91cc9zfr89ehj"}
[2018-04-11 20:24:05.811953]: RPC request 0x7e607401da00 completed in: 145 microseconds
[2018-04-11 20:24:05.824754]: {"action": "version"}
[2018-04-11 20:24:05.824886]: RPC request 0x7e607c161ef0 completed in: 158 microseconds
If I am missing something here, please correct and let me know. And please test it personally and post your findings.
EDIT: But of course, a caching layer would be nice - similar to what you've been doing with your dashboard. Could you come up with a pull request for the node monitor?
this issue is compounded by the number of api.php instances. If you open 20 browsers to the node monitor it is compounded x20
I think it's not. The node will process more overall RPC requests sequentially, i.e. take more time. It will not process more RPC requests per second, as all the requests are reaching the node from the same IP (localhost). The monitor could run into a timeout, the webserver could crash, but the node should be OK.
There is also several measures (connection / request limits) that can be enforced through nginx, e.g. https://www.nginx.com/blog/mitigating-ddos-attacks-with-nginx-and-nginx-plus/
Still, I'd prefer to have a caching layer in-between. Who can help? I am not really a web developer.
perceptivetek concerns are right. In the backend there should be something like cron which executes api.php script (which in fact calls node rpc) in declared time interval and put gathered data into some database (like file, mysql, mongo...). Then frontend (user) needs only to query database to get and present data.
I already wrote a plugin to nanoNodeMonitor with something like that to show network trafic on the vps. I'm using cron to call vnstat app every 5 minutes and dump data to file which next is read from your php script.
edit: I've just implemented that solution for my monitor. Cron job is calling api.php every 5 minutes and output data to file which nextly is called from updateStats function.
Just confirmed that RPC curl POST are processed through rai_node IO threads, which could cause issues when bootstrapping if they are being saturated by this type of attack.
@perceptivetek Did you try this out with your node? In my setup, I see all RPC requests from the same source IP being processed by a single IO thread strictly sequentially. Do you see something else with your node? Are RPC requests from the same source being processed by ALL IO threads?
Any suggestions on elegant solutions for the node monitor? Having a cronjob do the RPC calls and save the output to files works of course but adds an additional configuration step while setting up the node monitor. I'd prefer something that could be directly incorporated into the node monitor code, but I am not very familiar with recent web dev frameworks. Any ideas?
@dbachm123 I was talking with plasmapower about I/O thread saturation. Now with network v8 bootstrapping can fully saturate all available threads. I noticed this when curl requests directed at my node were randomly timing out. They may reserve some a thread for RPC in the future.
OK - thanks for the update. As an increasing number of people seems to be using the output of api.php in several monitoring tools, I'll change the monitor to fetch data via cron once a minute, rather than via direct RPC calls. Hope to find some spare time for that soon ...
@dbachm123 I am having the same issue as well. My logs are showing multiple IP addresses continuously fetching api.php
One of those is probably mine š Iām fetching stats for https://nano.meltingice.net/network every 10 seconds.
-- Ryan LeFevre (@meltingice) Sr. Software Engineer HODINKEE
From: Eddie Hoffman notifications@github.com Sent: Wednesday, April 18, 2018 7:40:15 PM To: NanoTools/nanoNodeMonitor Cc: Ryan LeFevre; Author Subject: Re: [NanoTools/nanoNodeMonitor] api.php opens the Nano node itself to DDoS attacks (#37)
@dbachm123https://github.com/dbachm123 I am having the same issue as well. My logs are showing multiple IP addresses continuously fetching api.php
ā You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/NanoTools/nanoNodeMonitor/issues/37#issuecomment-382562763, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAHtud2VeoJYgFEuKUIKOQL3qnJLHpMRks5tp87fgaJpZM4TQgkq.
I'm currently building a file cache that writes to /tmp. How long should the caching time be?
Integrated on branch develop. @all Please give it a try. Running at http://138.197.179.164/
Fixed with v1.4.0
Because
api.php
doesn't offer any kind of caching layer between the public web and the node RPC, it would be pretty easy to overload the Nano node by DDoS'ing/api.php
.At a quick glance, each run of
api.php
makes 5 RPC calls, so even at a mild 200 requests per second to api.php, we're hitting the Nano RPC at 1000 times/second. I haven't tested it personally but that is a substantial load that I'm not sure it could handle. If you're able to gather even a portion of the nodes running nanoNodeMonitor, you could probably take down a nice chunk of the network via simple HTTP DDoS attacks.