Apparently not very easy to secure

vharitonsky commented 6 years ago

Very easy to saturate elasticsearch using parallel queries with a large output size, makes elasticsearch waste time on fetch/serialize

ab -n 100000 -c 1000 -H'Authorization:Basic blk2Tk5UWlo2OjI3Yjc2YjlmLTE4ZWEtNDU2Yy1iYzVlLTNhNTI2M2ViYzYzZA==' 'https://scalr.api.appbase.io/good-books-ds//_search?preference=results&size=10000'

siddharthlatest commented 6 years ago

@vharitonsky This is where the middleware comes in. We recommend production use-cases to use this - see this section https://opensource.appbase.io/reactive-manual/getting-started/reactivebase.html#connect-to-elasticsearch and an example implementation using a proxy server.

The middleware is where you can have logic to check for DoS type requests (similar to your above example). You can also pass custom headers, for instance a JWT token of an authenticated user to have even more control over what requests should hit the ES cluster and what requests should be classified as DoS attacks.

vharitonsky commented 6 years ago

To do it properly, one needs to be able to completely parse the query which can contain scripts and complex aggregations, maybe there is some easy way out of this also ?

siddharthlatest commented 6 years ago

@vharitonsky In the most general case, yes - you will need to parse, at least the top level fields that the request body is made up of.

But there are some good heuristics that can be followed here:

ReactiveSearch itself doesn't use scripts. This is one thing you need to be careful about while setting, for instance - you can specify the contexts under which scripting should be allowed.
Aggregations are generally reducers, again ReactiveSearch only makes use of terms aggregations. Queries on the other hand can bring your cluster down. Keeping sane index sizes and appropriate shard settings is important,
You can also add a timeout parameter with the requests to automatically limit a complex query / aggregation to return back within a timeframe,
Following some rate-limiting heuristics to prevent a DoS style attack is also a good idea.

Unless you are customizing Reactivesearch queries heavily, there is a small whitelist of actions that we do that can be checked against for sanitizing.

siddharthlatest commented 6 years ago

Closing due to inactivity. One interesting thing that can be done here is have a boilerplate middleware that implements these heuristics and whitelists allowed actions, that can then be extended further for specific use-cases. I am opening another issue for that.

appbaseio / reactivesearch

Apparently not very easy to secure #219