Open tylerperk opened 1 month ago
Pinging @elastic/es-analytical-engine (Team:Analytics)
This is related to the scheduled work for ST_DISTANCE, which covers at least the distance calculation part. However calculating speed is a separate concern. At the simplest, this could be simply distance/duration
, which does not require a new function, so could be considered complete once the ST_DISTANCE is done. However, there are two further considerations:
current
and previous
document into the same row.geo_line
aggregations, and those collected sequences of locations grouped by TSID into LineString
geometries, ordered by time, including a feature for line simplification for very large geometries. There was a request to filter out outliers that deviate too much from the line, and the above feature sounds related, where we want speed outliers to be detected and highlighted. If the users of this feature are likely to use TSDB features, since they are working with time-ordered event data, perhaps we should consider a TSDB feature around outlier detection (both spatial, temporal and spatiotemporal/speed)?To get this to work in ES|QL we would need to support inline stats. But it would be even more efficient to use some time-ordering, or event ordering approach and look at windowing functions. @alex-spies pointed out the SQL functions LEAD and LAG as a good approach to this. They also seem generally useful for event data, log data and the security use cases.
Description
For security use cases it it common to calculate the distance between two points (based on source IP addresses, typically) and the speed required to travel from one to the other. If the movement is "impossible" then that is a factor used to raise suspicion of malicious activity such as IP spoofing. This can be calculated in other query languages using multiple complex statements involving several math functions and magic numbers. At minimum we should make that possible but ideally we should encapsulate the math into a function that calculates this for you.