elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

Implement TopOne metrics aggregations #62801

Open stfaun opened 4 years ago

stfaun commented 4 years ago

Relates to #35639

At the beginning, I think new first/last metric aggregations should be implemented as a SingleValue metrics aggregation to support the feature.

But the first/last metrics aggregations seem to be redundant. They can be combined as a top metrics aggregation.

I have found that top_metrics metrics aggregations may be appropriate for my requirement. But when the first sorted document has no value for the target field, the top_metrics metrics aggregations will return null value as the result rather than ignoring it.

I understand the feature for top_metrics metrics aggregations. A top_metrics metrics aggregations may return several fields at the same time. So it should not ignore any doc for the bucket.

stfaun commented 4 years ago

Considering the above problem, implementing a top_one metrics aggregation as a SingleValue metrics aggregation may be a alternative solution.

Like the top_hits and top_metrics metrics aggregations, new top_one metrics aggregation needs a sort parameter to determine how to sort the docs in a bucket. UnLike the top_hits and top_metrics metrics aggregations, new top_one metrics aggregation should be a SingleValue metrics aggregation. Therefore, new top_one metrics aggregation can only extract one field of the first doc which are sorted by the specified sort fields.

Also, new top_one metrics aggregation provide a parameter ignore_null to determine if the null value of target field should be ignored.

New top_one metrics aggregation maybe used as follows:

GET /exams/_search
{
  "size": 0,
  "aggs": {
    "first_grade": {
      "top_one": {
        "value": {
          "field": "grade"
        },
        "sort": {
          "timestamp": "asc"
        },
        "ignore_null": true
      }
    }
  }
}

Which yields a response like:

{
  ...
  "aggregations": {
    "first_grade": {
      "value": 70.0
    }
  }
}

Because new top_one metrics aggregation is a SingleValue metrics aggregation, its result can be used in the bucket_path of bucket_script/bucket_selector/bucket_sort.

stfaun commented 4 years ago

I would like to implement this feature, but I'm Not sure if it's a good design to implement the top_one metrics aggregation. Or the previos first/last metrics aggregation may be better?

elasticmachine commented 4 years ago

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal commented 4 years ago

Hiya @stfaun, thanks for opening this issue! I'm going to mark this as team-discuss, so that the analytics team can chat about it. I think it's an interesting use-case, but I'm personally not sure if it'd be better served as a modification to top-metrics (some kind of flag or mode when only one field is needed?) or as a whole new agg as you suggest. Both approaches have pros/cons.

Will write back once we've discussed!

polyfractal commented 4 years ago

Hiya @stfaun, we chatted about this and were curious if a filter aggregation + exists query would solve your needs?

The filter aggregation will ensure that all documents inside the bucket match the provided query/filter, and the exists query can be used to ensure that all documents have the desired field (so that the "top one" document doesn't have a null value). You can then use the top-metrics agg specifying a single field, and you should get the "top" doc that has a non-null value.

stfaun commented 4 years ago

Hi @polyfractal, the solution you suggests does work for me. I have been confirm that it can be use in the bucket_path of bucket_script/bucket_selector/bucket_sort.

Thanks for your replies.