elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.52k stars 24.6k forks source link

Date type has not enough precision for the logging use case. #10005

Closed jordansissel closed 5 years ago

jordansissel commented 9 years ago

At present, the 'date' type is millisecond precision. For many log use cases, higher precision time is valuable - microsecond, nanosecond, etc.

The biggest impact of this is during sorting of search results. If you sort chronologically, newest-first, by a date field, documents with the same date will probably be sorted incorrectly (because they match). This is often reported by users seeing events "out of order" when they have the same timestamp. Specific example being sorting by date and seeing events in newest-first order, unless there is a tie, in which case oldest-first (or first-written?) appears. This causes a bit of confusion for the ELK use case.

Related: https://github.com/logstash-plugins/logstash-filter-date/pull/8

I don't have any firm proposals, but I have two different implementation ideas:

mbullock1986 commented 7 years ago

Hi All,

I know this is difficult however wondered if this issue has moved on with the advent of version 6?

Thanks!

jpountz commented 7 years ago

Sorry, it hasn't moved.

jchannon commented 7 years ago

Any ideas when it will be? I've been recommended ELK and I hit this

jpountz commented 7 years ago

No idea. The only thing I can tell is that it won't be fixed in a short term.

jchannon commented 7 years ago

Am I right in that issue for dumb people like me is that when I send in "ts": "2017-08-30T14:26:30.9157480Z" ES converts that to 1504103190915 chops off the last 4 digits, parses that as a date but obviously has missing digits off the millisecond so the sorting/search is not as accurate as expected?

synhershko commented 7 years ago

@jchannon what is your use case that requires that precision?

@jpountz maybe 6.x sorted indexes can be the answer to this, or are they using the same precision as the indexed values?

jchannon commented 7 years ago

My use case? I have always logged to 6 decimal places and want to keep it that way. I'm astounded that this highly recommended piece of software is so poor on storing/converting dates

StephanX commented 6 years ago

Our use case is that we ingest logs kubernetes => fluentd (0.14) => elasticsearch, and logs that are emitted rapidly (anything under a millisecond apart, which is easily done) obviously have no way of being kept in that order when displayed in kibana.

varas commented 6 years ago

Same issue, we are tracking events that happen within nanosec precision.

Is there any plan to increase it?

clintongormley commented 6 years ago

Yes, but we need to move from Joda to Java.time in order to do so. See https://github.com/elastic/elasticsearch/issues/27330

gavenkoa commented 6 years ago

I opened bug in Logback as its core interface also preserves data in millisecond resolution so precision is lost even earlier, before ES: https://jira.qos.ch/browse/LOGBACK-1374

It seems that historical java.util.Date type is the cause of problems is Java world.

shekharoracle commented 6 years ago

Same use case, using kubernetes filebeat elasticsearch stack for log collection, but not having nano second precision is leading to incorrect ordering of logs.

portante commented 6 years ago

Seems like we need to consider the collectors providing a monotonically increasing counter which records the order in which the logs were collected. Nanosecond precision does not necessarily solve the problem because time resolution might not be nanosecond.

lgogolin commented 6 years ago

Seriously guys ? This bug is almost 3 years old...

matthid commented 6 years ago

The problem is also that if you try to find a workaround you run into a series of other bugs so there is not even a viable acceptable workaround:

So the only viable workaround seems to be to have an epoch + 2 additional digits which are increased in logstash when the timestamp matches.

Does anyone have found a better approach?

jraby commented 6 years ago

Been storing microseconds since epoch in an number field for 2 years now. Suits our needs but YMMV.

jpountz commented 6 years ago

cc @elastic/es-search-aggs

tlhampton13 commented 6 years ago

Not all time data is collected using commodity hardware. There is plenty of specialty equipment that collects nanosecond resolution data. Thinking about other applications besides log analysis. Sorting by time is critical, but aggregations over small timeframes is also important. For example, maybe I just want to aggregate some scientific data over a one second window or even over millisecond window.

I have nanosecond resolution data and would love to be able to use ES aggregations to analyze it.

jimczi commented 5 years ago

Elasticsearch 7.0 will include a date_nanos field type that handles nanoseconds sorting precision: https://github.com/elastic/elasticsearch/pull/37755 Nanoseconds precision field is now a first class citizen that doesn't require two fields to retain precision so I will close this issue, please open new ones if you find bugs or enhancements to make on this new field type.

gavenkoa commented 2 years ago

https://jira.qos.ch/browse/LOGBACK-1374 added Instant getInstant() to the interface ILoggingEvent allowing to capture nanotime resolution!

It is in 1.3.0-alpha12. I'll expect to see usage in new appenders.