Closed vharitonsky closed 6 years ago
@vharitonsky This is where the middleware comes in. We recommend production use-cases to use this - see this section https://opensource.appbase.io/reactive-manual/getting-started/reactivebase.html#connect-to-elasticsearch and an example implementation using a proxy server.
The middleware is where you can have logic to check for DoS type requests (similar to your above example). You can also pass custom headers, for instance a JWT token of an authenticated user to have even more control over what requests should hit the ES cluster and what requests should be classified as DoS attacks.
To do it properly, one needs to be able to completely parse the query which can contain scripts and complex aggregations, maybe there is some easy way out of this also ?
@vharitonsky In the most general case, yes - you will need to parse, at least the top level fields that the request body is made up of.
But there are some good heuristics that can be followed here:
timeout
parameter with the requests to automatically limit a complex query / aggregation to return back within a timeframe,Unless you are customizing Reactivesearch queries heavily, there is a small whitelist of actions that we do that can be checked against for sanitizing.
Closing due to inactivity. One interesting thing that can be done here is have a boilerplate middleware that implements these heuristics and whitelists allowed actions, that can then be extended further for specific use-cases. I am opening another issue for that.
Very easy to saturate elasticsearch using parallel queries with a large output size, makes elasticsearch waste time on fetch/serialize