elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.05k stars 24.83k forks source link

Support calendar offsets with calendar_intervals in date histograms #93418

Open craigtaverner opened 1 year ago

craigtaverner commented 1 year ago

Description

In the issue at https://github.com/elastic/elasticsearch/issues/93180 it was noted that when using offset with a calendar_interval, if the offset is longer than the interval, there can be surprising results. This is because the offset is a fixed interval, and adding a fixed interval to a calendar interval will not result in the same starting date for each bucket. For example, adding "offset": "+35d" to "calendar_interval": "month", will move the bucket starting at 2022-01-01 to instead start at 2022-02-04, but the bucket starting at 2022-02-01 will move to 2022-03-07. Note that the starting day is different. Before the offset, the original histogram contained buckets that all started on the same day of the month, but the new one does not.

The use case desired in the original issue was to be able to define financial years and financial quarters in terms of date histogram buckets. Elasticsearch defines the calendar_interval of a quarter as starting on the 1st of the months of January, April, July and October. If we want the financial year to start on the 4th February (and all quarters therefor on the 4th of the respective months), it would be great to specify this with an offset. But the obvious choice of +35d will, as described above, not work.

We need a concept of a calendar offset. One approach to this would be to enhance the offset field to allow using calendar offset specifications. For example, the above use case could be supported with "offset": "+1m+3d", with the m meaning calendar month.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

craigtaverner commented 1 year ago

This could be related to https://github.com/elastic/elasticsearch/issues/50139 Also review https://davecturner.github.io/2019/04/14/timezone-rounding.html