elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

Inconsistent behavior/documentation with Date Agg Intervals #50091

Open niemyjski opened 4 years ago

niemyjski commented 4 years ago

Elasticsearch version (bin/elasticsearch --version): 7.5

Plugins installed: ["mapper-size"]

JVM version (java -version): docker-image

OS version (uname -a if on a Unix-like system): docker-image

Description of the problem including expected versus actual behavior:

We noticed that aggregations don't behave how we expected them to as per the docs. Please see this issue we logged on nest for more context: https://github.com/elastic/elasticsearch-net/issues/4251

Steps to reproduce:

POST /test1/_doc/
{
  "date": "2019-10-17T07:01:26.3488489+00:00"
}

GET /test1/_search 
{
  "aggs": {
    "date_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      }
    },
    "date_m": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "M"
      }
    },
    "date_onem": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "1M"
      }
    }
  }, 
  "size": 0
}

The docs state that 1M, month and M are valid calendar intervals. However not all of them work correctly when running in ES. This translates to pretty much all the values listed in the documentation (on two different pages, one talks about nanos and micro seconds but the main documentation doesn't)

Also of note, it appears when you use kibana it creates a date_histogram via a snippet using interval (legacy).

Provide logs (if relevant):

I doing a date_histogram aggregation with calendar_interval values of:

Originally posted by @ejsmith in https://github.com/elastic/elasticsearch-net/issues/4251#issuecomment-564328065


But the docs at least online don't say anything about time units smaller than 1m

The Date Histogram docs have fixed intervals to milliseconds. The time values docs have values down to nanos.

but I could even see us supporting 1ms. What do you recommend we use in the mean time (We went through QA and are about to release but are blocked as we have saved agg queries that customers need to run, from seconds to year time buckets).

I'm not sure what the smallest value supported for buckets is, but looking at DateHistogramInterval, it looks like it may be 1s, which can also be supplied as 1000ms input:

https://github.com/elastic/elasticsearch/blob/6ae6f57d39f473e4968700a28a582b93fe3a3bf4/server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/DateHistogramInterval.java#L37-L66

  • M
    • Fails directly in ES with The supplied interval [M] could not be parsed as a calendar interval., but the documentation page says that it is a valid value.
    • Fails to parse if I specify the string M to DateHistogramAggregationDescriptor<T>.CalendarInterval with Expression 'M' is invalid

There's something not right; either the documentation should be clearer, or the implementation should also support the single character time units, which it looks like it doesn't

https://github.com/elastic/elasticsearch/blob/23bf310c849c77e50ab87dbb59ea42d9412fe187/server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/DateHistogramAggregationBuilder.java#L79-L98

Originally posted by @russcam in https://github.com/elastic/elasticsearch-net/issues/4251#issuecomment-564332921

elasticmachine commented 4 years ago

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal commented 4 years ago

Heya, thanks for the detailed ticket... and apologies for the confusion/troubles :( I'll start with the easy ones first:

Kibana

Also of note, it appears when you use kibana it creates a date_histogram via a snippet using interval (legacy).

Correct, although hopefully only temporarily. Kibana, like many user applications, is still migrating off the old interval. Since date histograms are so widely used, we expect to have a long deprecation period on interval.

Nanoseconds

one talks about nanos and micro seconds

Nanoseconds are in a tricky situation right now. They can be indexed and searched (at cost of reduced dynamic range), but aggregations currently only work in millisecond range regardless of the underlying data type. This is noted in the date_nano docs at the bottom. That's why nano support isn't mentioned on the date histogram page.

DateHistogramInterval helpers

I'm not sure what the smallest value supported for buckets is, but looking at DateHistogramInterval, it looks like it may be 1s, which can also be supplied as 1000ms input:

These are technically just "helper" methods, and were more useful with the old interval. Today they just append the correct suffix to the provided numeric value. It does look like we're missing the "milliseconds" helper, but you can manually specify it with:

new DateHistogramInterval("5ms")

Intervals

1M

This should work for calendar_interval and interval, but not fixed_interval

month

This should work for calendar_interval and interval, but not fixed_interval

M

This should not work, and I don't think it's ever been supported. E.g. neither of the calendar-aware parsers in 6.0 or 2.0 support just the unit by itself, and fixed intervals don't support calendar concepts like months. I just tested on a 6.0 build and it does indeed throw an exception:

failed to parse setting [DateHistogramAggregationBuilder.interval] with value [M] as a time value: unit is missing or unrecognized

I tested the fixed units too and they also don't work, although from a parsing issue and an ugly exception:

"reason": "failed to parse [s]",
  "caused_by": {
    "type": "number_format_exception",
    "reason": "For input string: \"\""
}

Conclusions

So in conclusion it seems we (ES) has two action items:

I hope this helped clear up the situation!

elasticmachine commented 3 years ago

Pinging @elastic/es-docs (Team:Docs)