elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.49k stars 24.88k forks source link

ESQL: LIMIT per group key #112918

Open nik9000 opened 2 months ago

nik9000 commented 2 months ago

Description

We'd like some ability to do a top-n, but per group key. This is similar to _search's collapse. Something like:

FROM foo
| SORT @timestamp DESC
| LIMIT 10 PER hostname

Would get you the 10 latest documents for each hostname.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

bpintea commented 2 months ago

Wondering if, since we now have STATS-specific WHERE, a STATS-specific LIMIT wouldn't fit better.

nik9000 commented 2 months ago

Wondering if, since we now have STATS-specific WHERE, a STATS-specific LIMIT wouldn't fit better.

It's also quite similar to TOP. Right now you can get the last time timestamps:

| STATS TOP(@timestamp, 10, DESC) BY hostname

We think we can add extra data "along side" the TOP data which'd produce the value associated with that timestamp:

| STATS TOP(@timestamp, 10, DESC, bytes_out) BY hostname

It's quite possible that if we had that we buffer the _doc information to get the documents associate with those timestamps. And then we'd be able to load fields after the STATS. Sort of. Maybe. I'm not sure how you'd write that.

teresaalvarezsoler commented 1 month ago

This is very important for us to convert Lens DSL to ES|QL because all charts are built with this feature ON by default. cc @tylerperk