elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.53k stars 24.61k forks source link

Add dedicated field types for durations and byte sizes #31244

Open jpountz opened 6 years ago

jpountz commented 6 years ago

I'm opening this feature request as a follow-up of a conversation with @ruflin. Today users typically use numeric types (eg. long, float, scaled_float) with a convention regarding units (sometimes made explicit in the name of the field, eg. transferred_bytes or duration_ms) in order to store durations or byte sizes, but we could make the experience better by having native support for these fields in Elasticsearch:

One risk is that we end up with lots of feature requests to support distances, weights, etc. Where do we draw the line? It's been suggested that we only have one field that we configure with what it is going to store but it might not be practical given that some units have their own specificities, eg. k means 1024 for byte sizes and 1000 for weights, some durations are not fixed (months, years, etc.). At first sight it looks cleaner to have one type per unit, which doesn't mean they can't share code internally.

elasticmachine commented 6 years ago

Pinging @elastic/es-search-aggs

ddorian commented 6 years ago

Hmm, no other database/search-engine has this type of field, correct ?

jpountz commented 6 years ago

Good question. I don't know of any, but since I have limited knowledge of what field types other datastores provide, I could easily miss (even a major) one.

jpountz commented 6 years ago

Discussed in FixitFriday: we want to do it. We will start with duration and byte sizes, which are common data that is stored in Elasticsearch. There might be asks for distances and temperatures coming next, we will handle such requests as they come depending on how much usage we expect from them.

timroes commented 6 years ago

@elastic/kibana-visualizations This will mean changes in supported types (e.g. do they actual return numeric values we can use in charts? will they include scaled extensions?), and might also require some changes in how values need to be handled or how we can show them.

@elastic/kibana-discovery This might mean changes to the filtering UI. Also this might mean changes to KQL to query for those fields.

@elastic/kibana-management This mean new fields types (if that effects index patterns somehow), this might also mean some changes to field formatters for those types.

@jpountz Please mention the above teams in case you are creating a PR or further tickets related to this feature.

timroes commented 6 years ago

/cc @epixa /cc @alexfrancoeur (I think you know about that topic already, but it looks like you are not [yet] following that issue)

Bargs commented 6 years ago

These seem similar to the range types, which afaik we don't do anything special for in Kibana. Is there something different about these that would imply we need to support them at launch, or is it a similar level of priority as other field types that we don't currently support?

timroes commented 6 years ago

I think we should at least be involved from the very beginning to highlight potential issues. For example I talked yesterday to Adrien, and right now the API was planned to return strings for those units, which would make it impossible to use any of those values inside charts as metrics (like drawing the traffic usage over time, or the duration an API took per Endpoint). Since imho especially for those metric values, people want to visualize them quickly in Kibana, we should at least staying involved in that, and not start thinking about it, after ES has build that feature and possibly can't change any API around it easily anymore. At what point we actually want to put this on our roadmap I think is a different discussion we need to have :-)

alexfrancoeur commented 6 years ago

@Bargs as more people begin to use auto complete, is there anything we'll have to do to support this in KQL/Kuery?

Bargs commented 6 years ago

KQL queries get turned into regular query DSL queries like range, match, exists and query_string, so assuming these field types don't need any special treatment in order to be used in those queries we should be fine.

ruflin commented 5 years ago

I want to bump this thread as I still see quite a few use cases especially for the duration type.

dagguh commented 5 years ago

Note that there's an ISO standard for duration and time intervals, including syntax and semantics. We should respect those standards for maximum reuse and the principle of least astonishement.

ruflin commented 5 years ago

@jasontedor I wanted to bump this issue here as we started to discuss again around bytes and duration fields in Elasticsearch in the context of ECS and adding metrics: https://github.com/elastic/ecs/pull/480

felixbarny commented 8 months ago

I was about to file a feature request for a duration type and found this issue. I think dedicated types for duration and byte sizes would be really cool for more natural queries, especially for observability use cases. Now that ES|QL makes writing queries nicer and more concise, I think that these types would add another layer of sugar that makes interacting with your data more intuitive, expressive, and sweet.

When implementing the field types, I think we should design them with backwards compatibility with numeric types in mind so that we can use them as a drop-in replacement that provides strictly additive functionality. One aspect of that is that by default, we should return a numerical value instead of a string representation of the duration or byte size. We can use the formatter functionality that exists for the date field type to optionally return the values in a string representation.

I realize that it's difficult to prioritize this as it's not really an essential thing and it potentially requires changes in a lot of different areas. But maybe we can restrict the amount of effort and coordination by adhering to the principle of strict backwards compatibility with numeric field types. This may also be a good issue to pick up for a spacetime project.

felixbarny commented 8 months ago

After I wrote my previous message, I saw an internal discussion where the decision was to not add dedicated field types but add support for arbitrary metadata on field types: https://github.com/elastic/elasticsearch/pull/49419.

I don't disagree with that decision and I don't think this is an either/or kind of thing. In fact, IMHO, this issue is as relevant as ever as one of the things that's not supported using https://github.com/elastic/elasticsearch/pull/49419 is doing queries like http.request.body.bytes > 1MiB. In that discussion, it has been mentioned that we could support something like that in KQL. I don't think that's an adequate alternative to directly supporting this in Elasticsearch as all other query languages, most notably ES|QL, wouldn't benefit from that. Another concern that was mentioned is the backwards compatibility with numeric field types. But as mentioned in my previous message, I think we can make it fully compatible. This will be a requisite anyway for us to be able to adopt these field types for existing use cases.

felixbarny commented 8 months ago

I played around with this a bit and created a PR: https://github.com/elastic/elasticsearch/pull/104037.

Instead of creating dedicated field types, I leveraged the unit metadata field which seemed more appropriate. That way, the choice of the unit is orthogonal to the numeric field type used and it also integrates well with OpenTelemetry metric units.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)