elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.18k stars 24.54k forks source link

Date type has not enough precision for the logging use case. #10005

Closed jordansissel closed 5 years ago

jordansissel commented 9 years ago

At present, the 'date' type is millisecond precision. For many log use cases, higher precision time is valuable - microsecond, nanosecond, etc.

The biggest impact of this is during sorting of search results. If you sort chronologically, newest-first, by a date field, documents with the same date will probably be sorted incorrectly (because they match). This is often reported by users seeing events "out of order" when they have the same timestamp. Specific example being sorting by date and seeing events in newest-first order, unless there is a tie, in which case oldest-first (or first-written?) appears. This causes a bit of confusion for the ELK use case.

Related: https://github.com/logstash-plugins/logstash-filter-date/pull/8

I don't have any firm proposals, but I have two different implementation ideas:

jordansissel commented 9 years ago

I know Joda's got a precision limit (the Instant class is millisecond precision) and a year limit ("year must be in the range [-292275054,292278993]"). I'm open to helping explore solutions in this area.

synhershko commented 9 years ago

What about consequences to field_date size? even with docvalues in place, cardinality will be ridiculously high. Even for those scenarios which need this, this could be an overkill, no?

nikonyrh commented 9 years ago

Couldn't you just store the decimal part of the second in a secondary field (as a float or long) and sort by these two fields when needed? You could still aggregate based on the standard date field but not at a microsecond resolution.

markwalkom commented 9 years ago

I've been speaking to a few networking firms lately and it's dawned on me that microsecond level is going to be critical for IDS/network analytics.

markwalkom commented 9 years ago

Another request from the community - https://discuss.elastic.co/t/increase-the-resolution-of-elastic-timestamps-to-nanoseconds/24227

anoinoz commented 9 years ago

that last request is mine I believe. I would add that to monitor networking (and other) activities in our field, nanosecond support is paramount.

abrisse commented 8 years ago

:+1: for this feature

jack-pappas commented 8 years ago

What about switching from Joda Time to date4j? It supports higher-precision timestamps compared to Joda and supposedly the performance is better as well.

dadoonet commented 8 years ago

Before looking on the technical side, is BSD License compatible with Apache2 license?

dadoonet commented 8 years ago

So BSD is compatible with Apache2.

clintongormley commented 8 years ago

I'd like to hear @jpountz's thoughts on this comment https://github.com/elastic/elasticsearch/issues/10005#issuecomment-77479544 about high cardinality with regards to index size and performance.

I could imagine adding a precision parameter to date fields which defaults to ms, but also accepts s, us, ns.

We would need to move away from Joda, but I wouldn't be in favour of replacing Joda with a different dependency. Instead, we have this issue discussing replacing Joda with Java.time https://github.com/elastic/elasticsearch/issues/12829

jpountz commented 8 years ago

@clintongormley It's hard to predict because it depends so much on the data so I ran an experiment for an application that ingests 1M messages at a 2000 messages per second per shard rate.

Precision Terms dict (kB) Doc values (kB)
milliseconds 3348 2448
microseconds 10424 3912

Millisecond precision is much more space-efficient, in particular because with 2k docs per second, several messages are in the same millisecond, but even if we go with 1M messages at a rate of 200 messages per second so that sharing the same millisecond is much more unlikely, there are still significant differences between millisecond and microsecond precision.

Precision Terms dict (kB) Doc values (kB)
milliseconds 7604 2936
microseconds 10680 4888

That said, these numbers are for a single field, the overall difference would be much lower if you include _source storage, indexes and doc values for other fields, etc.

Regarding performance, it should be pretty similar.

clintongormley commented 8 years ago

From what users have told me, by far the most important reason for storing microseconds is for the sorting of results. it make no sense to aggregate on buckets smaller than a millisecond.

This can be achieved very efficiently with the two-field approach: one for the date (in milliseconds) and one for the microseconds. The microseconds field would not need to be indexed (unless you really need to run a range query with finer precision than one millisecond), so all that would be required is doc_values. Microseconds can have a maximum of 1,000 values, so doc_values for this field would require just 12 bits per document.

For the above example, that would be only an extra 11kB.

A logstash filter could make adding the separate microsecond field easy.

clintongormley commented 8 years ago

Meh - there aren't 1000 bits in a byte. /me hangs his head in shame.

It would require 1,500kB

pfennema commented 8 years ago

If we want to use the ELK framework proper analyzing network latency we really need nanosecond resolution. Are there any firm plans/roadmap to change the timestamps?

portante commented 8 years ago

Let's say I index the following JSON document with nanosecond precision timestamps:

{ "@timestamp": "2015-09-30T12:30:42.123456789-07:00", "message": "time is running out" }

So the internal date representation will be, 2015-09-30T19:30:42.123 UTC, right?

But if I issue a query matching that document, and ask for either the _source document or the @timestamp field explicitly, won't I get back the original string? If so, then in cases where the original time string lexicographically sorts the same as the converted time value, would that be sufficient for a client to further sort to get what they need?

Or is there a requirement that internal date manipulations in ES need such nanosecond precision? I am imagining that if one has records with nanosecond precision, only being able to query for a date range with millisecond precision could potentially result in more document matches than wanted. Is that the major concern?

pfennema commented 8 years ago

I think the latter, internal date manipulations need probably nanosecond precision. Reason is that when monitoring latency on 10Gb networks we get pcap records (or packets directly from the switch via UDP) which include multiple fields with nanosecond timestamps in the record. We like to find out the difference between the different timestamps in order to optimize our network/software and find correlations. In order to do this we like to zoom in on every single record and not aggregate records.

abierbaum commented 8 years ago

:+1: for solving this. It is causing major issues for us now in our logging infrastructure.

clintongormley commented 8 years ago

@pfennema @abierbaum What problems are you having that can't be solved with the two-field solution?

pfennema commented 8 years ago

What we like to have is that we have a timescale in the display (Kibana) where we can zoom in on the individual measurements which have a timestamp with nanosecond resolution. A record in our case has multiple fields (NICTimestamp, TransactionTimestamp, etc) which we like to correlate with each other on an individual basis hence not aggregated. We need to see where spikes occur to optimize our environment. If we can have on the x-axis the time in micro/nanosecond resolution we should be able to zoom in on individual measurements.

abierbaum commented 8 years ago

@clintongormley Our use case is using ELK to analyze logs from the backend processes in our application. The place we noticed it was postgresql logs. With the current ELK code base, even though the logs coming from the database server have the commands in order, once they end up elastic search and are visualized in kibana the order of items happening on the same millisecond are lost. We can add a secondary sequence number field, but that doesn't work well in Kibana queries (since you can't sort on multiple fields) and causes quite a bit of confusion on the team because they just expect the data in Kibana to be sorted in the same order as it came in from postgresql and logstash.

gigi81 commented 8 years ago

We have the same problem as @abierbaum described. When events happen on the same millisecond the order of the messages is lost. Any workaround or suggestion on how to fix this would be really appreciated.

dtr2 commented 8 years ago

You don't need to increase the timestamp accuracy: instead, the time sorting should be based on both timestamp and ID: message IDs are monotonically increasing, and specifically, they are monotonically increasing for a set of messages with the same timestamp...

jcollie commented 8 years ago

@dtr That may be true for IDs that are automatically assigned, but only if the messages are indexed in the correct order in the first place, and only if the application isn't supplying it's own IDs. There's definitely no way that I could depend on that behavior. Also is the monotonically increasing IDs guaranteed, or is it an implementation artifact, especially when considering clusters of more than one node?

dtr2 commented 8 years ago

I believe the original intent was to see a bulk of log messages originated from the same source. If a cluster is involved, then probably the timestamp is the only clustering item (unless, of course, there is a special "context" or "session" field) For that purpose, we can rely on the id (assuming, of course, its monotonically increasing at least per source)

jcollie commented 8 years ago

@dtr2 that's a lot of "ifs" to be relying on ElasticSearch's autogenerated IDs, nearly all of which are violated in my cluster:

1) Some messages supply their own ID if the source has a unique ID already associated with it (systemd journal messages in my case). 2) All of my data runs through a RabbitMQ server (sometimes passing through multiple queues depending on the amount of processing that needs to be done) with multiple consumers per queue so there's no way that I can expect documents to be indexed in any specific order, much less by the same ElasticSearch node.

In any case, ElasticSearch does not guarantee the behavior of autogenerated IDs. The docs only guarantee that the IDs are unique:

https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

So I can hope that you see that trying to impose an order based upon the ID cannot be relied upon in the general case. Yes, there may be certain specific instances where that would work, but you'd be relying on undocumented implementation details.

dtr2 commented 8 years ago

@jcollie , In that case, trying to find a "context" is impossible - unless your data source provides it. The idea was to find a context and filter "related" lines together.

bobrik commented 8 years ago

@jcollie "IDs" (in fact they are counters per log source) have to be generated before ingestion, outside of elasticsearch.

Instead of sorting on time you sort on time, counter.

jcollie commented 8 years ago

I don't get it - what is the resistance to extending timestamps to nanosecond accuracy? I realize that would take a lot of work and would likely be a "3.x" feature, but anything else is just a workaround until nanosecond timestamps are available.

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

As an example, let's say that I implement this counter in each of my log sources (for example as a logstash plugin). Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source. That may be an extreme example but I think that it illustrates the point.

bobrik commented 8 years ago

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

@jcollie can you tell me how you keep clocks on your machines in perfect sync so nanosecond accuracy starts making sense? Even Google struggles to do so:

Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source.

You are right here. There is no way. You can easily see jitter of a few ms between ntp sync between machines in the same rack:

19 Jan 09:27:40 ntpdate[11731]: adjust time server 10.36.14.18 offset -0.002199 sec
19 Jan 09:27:50 ntpdate[11828]: adjust time server 10.36.14.18 offset 0.004238 sec

On the other hand, you can reliably say in which order messages were processed by a single source:

I'd be happy to be proven wrong, though.

pfennema commented 8 years ago

Clock synchronisation between machines is done by PTP if you need an accurate synchronisation. Which is needed when measuring low latency/high frequency trading networks. The PTP source is usually the switch in the network (Arista, Cisco)

Ivan Babrou schreef op 2016-01-19 10:32:

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

@jcollie [1] can you tell me how you keep clocks on your machines in perfect sync so nanosecond accuracy starts making sense? Even Google struggles to do so:

Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source.

You are right here. There is no way. You can easily see jitter of a few ms between ntp sync between machines in the same rack:

19 Jan 09:27:40 ntpdate[11731]: adjust time server 10.36.14.18 offset -0.002199 sec 19 Jan 09:27:50 ntpdate[11828]: adjust time server 10.36.14.18 offset 0.004238 sec

On the other hand, you can reliably say in which order messages were processed by a single source:

  • Single thread of your program.
  • Single queue in logstash.
  • Some other strictly ordered sequence (ex: kafka partition).

I'd be happy to be proven wrong, though.

Reply to this email directly or view it on GitHub [2].

Links:

[1] https://github.com/jcollie [2] https://github.com/elastic/elasticsearch/issues/10005#issuecomment-172790794

jordansissel commented 8 years ago

I don't get it - what is the resistance to extending timestamps to nanosecond accuracy

@jcollie My understanding of the discussion in this issue is that there is no specific resistance. We have been discussing the costs/benefits of various implementation options. Hopefully this helps clarify things.

clintongormley commented 7 years ago

It requires replacing Joda time with Java time, which is something we would like to do but it is a big project.

jtcooke commented 7 years ago

+1 for this feature

mbullock1986 commented 7 years ago

Epoch Nano would be great we provide our customer logs in EPOCH nano and we have to sed these down to milli which adds processing time.

ghost commented 7 years ago

We also suffer from low resolution of time fields. Nanoseconds would be great. Looking forward for this feature.

Exocomp commented 7 years ago

I'm in favor of the higher precision, allow the user to choose the precision they want.

bamb00 commented 7 years ago

Wonder if this is the same issue with the time field in Kibana losing its time (ns) precision, "? time 2017-03-23T18:34:15.233983675Z" When the Index is reset in Kibana the time field loses some of its precision time, " time March 23rd 2017, 13:35:21.246"

If so...Is there a workaround to display the complete precision?

phanimahesh commented 7 years ago

I've recently encountered this when attempting to log events from an mqtt broker for iot applications. Had to trim down to milliseconds because having fractional part in milliseconds doesn't play well with ES.

jmwilkinson commented 7 years ago

This would be nice to have. We encounter a loss of precision when logging from a source generating at the microsecond level. We can get around this by using sequence numbers, but we would prefer not to have the loss of precision. When you're measuring hardware functions you want as precise time as possible.

portante commented 7 years ago

@jmwilkinson, at that frequency, don't we always need to record the total order since we can get multiple logs at the microsecond level these days?

jmwilkinson commented 7 years ago

@portante I'm afraid I didn't quite grok your question.

portante commented 7 years ago

Do folks believe that by having nanosecond precision in a time value each log entry emitted in sequence is guaranteed to get a different time value recorded for its timestamp? @jmwilkinson?

To be sure we are able to reconstruct the original order of logs from a source we need to record a monotonically increasing sequence for them. We can approximate with a ns precision timestamp, but I don't think we are guaranteed that sufficiently fast log generation will get separate timestamps.

import time

val = 0
same = 0
for i in range(10000):
    prev = val
    val = time.time()
    if prev == val:
        same += 1
print(same)
jmwilkinson commented 7 years ago

@portante That's a good point. And I think I was mistaken about of loss of precision- elasticsearch retains the precision, it just isn't used in sorting. Which is fine if we need to use sequence numbers anyway.

saffroy commented 7 years ago

@portante Actual timestamps convey much more information than just sequence numbers, can be much easier to generate (eg. across multiple processes) and situations exist where a certain precision (microsecond, nanosecond) gives correct ordering sufficiently often to be useful.

I have a use case where we collect performance events from modules inside a running process and across distributed processes (like Google Dapper): millisecond precision is insufficient to measure differences between closely related events, but we still need some absolute time to relate that to other events. The occasional clock glitch breaks perfect ordering, but it isn't an issue in practice, because it's performance data and we have many samples. So we worked around the loss of precision in ES by storing timestamps twice, in a date field (millisecond precision, good enough for search, and for humans to understand) and in a double field (microseconds since epoch, good for computations). Not exactly optimal though.

portante commented 7 years ago

@saffroy, you are correct. I was not trying to convey otherwise, just stating that you cannot rely on timestamps for ordering logs flowing from a source. That requires a monotonically increasing sequence number. This is an old but good paper on the topic.

ghost commented 7 years ago

@portante Why not to have both tools in arsenal? I mean let's say for my case I can use nanosecond time precision and this would be sufficient to reconstruct logs. Somebody will have so many logs and they will be so dense that even nanosecond won't be enough. No problem, he will start to do this with sequence numbers. And will use both, nano seconds AND sequence numbers when required.

My impression that it's better to have 1 field rather than 2 fields that allows you to reconstruct sequence of events. Even now in ES you have "offset", which can be used in some situations, but again, this looks like a workaround.

Not to mention that in Kibana for proper visualisations it should be one field, as field grouping is not supported.

So nano seconds are needed anyway and could be used in most situations, for those who need strict order, they can add sequence numbers to logs.

jordansissel commented 7 years ago

The occasional clock glitch breaks perfect ordering, but it isn't an issue in practice, because it's performance data and we have many samples

Broad declarations "occasional clock glitch ... not an issue in practice" are not very helpful - while this may be your experience, but it has not been my experience.

Time is an amazing and interesting topic, but let's stay focused on the issue of date type precision in Elasticsearch, and not about measuring time or properties of time itself.

clintongormley commented 7 years ago

Stalled on #12829

saffroy commented 7 years ago

@jordansissel The entire paragraph was only meant to give another example of a situation where we would benefit from increased precision when storing timestamps in Elasticsearch, and the remark about clock glitch ("not an issue in practice, because it's performance data") was to be understood within this specific example. Sorry if I wasn't clear about that.