I've noticed that sequences tend to timeout if one of the subqueries in the sequence yields too many results. This seems to suggest that the most expensive subquery is what drives the query. In some cases, we could update our sequencing algorithm to have the least expensive subquery drive the performance.
few -> few
:These sequences work great and need no improvements.
few -> many
Here's an example query that will use a sequence to join certutil.exe with its child process. Ideally, process.parent.name == "certutil.exe" would be added to the child to solve this problem, but we can still use it as an example.
sequence by agent.id with maxspan=1m
// matches 6 results
[process where process.name == "certutil.exe"] by process.entity_id
// matches millions of results
[process where event.type in ("start", "process_start")] by process.parent.entity_id
There are at most six sequences that could be returned from this. There are a few ways we could skip over large swaths of results for the second subquery to have faster performance time. I think we need to use search_after: <certutil1.@timestamp> to skip straight ahead. At most, one minute of data is required for that one sequence. I think using a search_after optimization while maintaining correctness is nontrivial, but probably worth the change. At most, you should need to consume 1 minute of data * 6 possible sequences, assuming you overlap.
many -> few
A similar example query could use a sequence to join certutil.exe to its parent process. In most scenarios, this exact query wouldn't be necessary because you can usually find the fields you need in process.parent, but it's a good example of the point. Same query as before, but flipped:
sequence by agent.id with maxspan=1m
// matches millions of results
[process where event.type in ("start", "process_start")] by process.parent.entity_id
// matches 6 results
[process where process.name == "certutil.exe"] by process.entity_id
Now, the second term is the driving factor. But there's no search_before unfortunately. But we could switch the sort order for search after. We can also use a range query to constrain this from both directions.
Now at a minimum, you have timestamp + tiebreaker, and you can switch back to your original order. Again this optimization is tricky to maintain correctness, especially for sequences more than 2 events long.
many -> many
These optimizations are more complex, and I think the best ways to scale will involve more query planning using aggregations or histograms to see where the results are more distributed. I think these are more complex, but for now, I think it's reasonable to have users adjust expectations until these are more optimal.
I've noticed that sequences tend to timeout if one of the subqueries in the sequence yields too many results. This seems to suggest that the most expensive subquery is what drives the query. In some cases, we could update our sequencing algorithm to have the least expensive subquery drive the performance.
few -> few
:These sequences work great and need no improvements.
few -> many
Here's an example query that will use a sequence to join certutil.exe with its child process. Ideally,
process.parent.name == "certutil.exe"
would be added to the child to solve this problem, but we can still use it as an example.There are at most six sequences that could be returned from this. There are a few ways we could skip over large swaths of results for the second subquery to have faster performance time. I think we need to use
search_after: <certutil1.@timestamp>
to skip straight ahead. At most, one minute of data is required for that one sequence. I think using a search_after optimization while maintaining correctness is nontrivial, but probably worth the change. At most, you should need to consume 1 minute of data * 6 possible sequences, assuming you overlap.many -> few
A similar example query could use a sequence to join certutil.exe to its parent process. In most scenarios, this exact query wouldn't be necessary because you can usually find the fields you need in
process.parent
, but it's a good example of the point. Same query as before, but flipped:Now, the second term is the driving factor. But there's no
search_before
unfortunately. But we could switch the sort order for search after. We can also use a range query to constrain this from both directions.Now at a minimum, you have timestamp + tiebreaker, and you can switch back to your original order. Again this optimization is tricky to maintain correctness, especially for sequences more than 2 events long.
many -> many
These optimizations are more complex, and I think the best ways to scale will involve more query planning using aggregations or histograms to see where the results are more distributed. I think these are more complex, but for now, I think it's reasonable to have users adjust expectations until these are more optimal.