RackSec / desdemona

Data-backed security operations
Eclipse Public License 1.0
2 stars 7 forks source link

Implement query language #17

Open lvh opened 8 years ago

lvh commented 8 years ago

This ticket is more of a brain dump and a roadmap/central discussion point for figuring out which subtickets should exist to get the query language we need.

At a very minimum, a query language should allow unification and disunification with literals; e.g. "where the IP is 12.34.56.78" or "where the IP is not 12.34.56.78". It should support that on arbitrary properties.

A query language should also support (arbitrarily nested) conjunctions and disjunctions of terms, e.g. "where (the IP is p.q.r.s AND we've seen lots of traffic for it) OR (the IP is h.i.j.k and traffic to that IP is unusual)".

Queries must be validated by a schema, so that we don't accidentally expose arbitrary computation.

A useful query language should support logic variables. After all, as a SOC analyst, I don't just want to check for a particular IP or subnet; I want queries like "where the IP is one of these, and that IP is designated as a potential threat" without having to specify that set literally; I just want to specify that we're talking about the same IPs p, q... This doesn't have to be part of the initial implementation; it is a nice-to-have.

For training purposes, it would also be very nice to have the option of "explaining" a query, and potentially even making it bidirectionally writable (this is an idea @smogg has floated and made mockups for), where there is a nice editor with intelligent completions. This is obviously not an MVP feature, and a later nice-to-have that should have its own ticket.

Wherever possible, this code should be written as cljc files, so that a maximum amount can (at least in theory) run on both the browser and the client side.

The suggested implementation for this is core.logic. This has some advantages:

lvh commented 8 years ago

So, after writing some sample test cases correlations is where this gets tricky. You can match on any segment (in the onyx sense) pretty easily, but that's boring. What you really want to do is find sets of segments for which something is true; and then you want to be able to expand that to any segment that's related (time, machine) to some that match...

As a consequence, this should probably also fit in data structures, because you're hella going to want to write a simple query planner for that.