Open jimczi opened 4 years ago
Pinging @elastic/es-search (:Search/Mapping)
Once we have this field, I guess the next question will be how to deal with objects that have a mix of strings and numbers. This makes me wonder whether we should try to fold this functionality into the existing flattened
field, or start thinking about whether we could have a sort of wrapper that could redirect fields to either flattened
or its numeric variant at both index and search time, e.g. something like that:
{
"foo": {
"type": "flattened",
"numeric_field_pattern": [ "*.count" ]
}
}
so that an object like
{
"foo": {
"tags": [ "x", "y" ],
"count": 42
},
"bar": {
"tags": [ "x" ],
"count": 100
}
}
would have its foo.tags
/bar.tags
fields indexed and searched with flattened
while the foo.count
/bar.count
fields would be indexed and searched with the numeric variant.
@polyfractal brought up the good point that in some telemetry use cases, all values represent counts. This type of data is similar to a histogram, but with labeled buckets. For example, we could be tracking the usage of every aggregation:
{
"agg_usage": {
"terms": 101,
"date_histogram": 2450,
...
}
}
It would be natural to perform a histogram-like aggregation on agg_usage
to sum up the counts for each entry terms
, date_histogram
, etc. When designing the feature, it'd be good to keep this case in mind -- for example, it could affect whether we want to distinguish long
counts vs. arbitrary numerics.
it could affect whether we want to distinguish
long
counts vs. arbitrary numerics
I similar fashion this feature might be useful for ML use cases. It seems to me that being able to specify the sub-type (long
, float
, double
, ...) would be good. For ML these vectors can become huge, but on the other side don't require necessarily a double
. Being able to define the sub-type (e.g. float
) would be a way to choose between precision and space.
Does this issue cover support for histogram
and aggregate_metric_double
fields? For the APM/Metrics use-case of https://github.com/elastic/elasticsearch/issues/63530, we will need to store basic numbers, histograms, and at some point probably aggregate metrics.
+1, following. This feature will unblock the ability to remove nested fields in a use case I have 😁
+1, following. I need to have numeric(float) flattened fields to use on thousands of unique field names with field_value_factor functions.. Currently, I had to increase default mapping count but it's bad practice as doc said.
+1, following!
+1. It would really help in storing lots of financial information without a mapping explosion.
+1
+1
While this is being worked upon, I am able to way around numeric range query on flattened type leveraging runtime fields
at query time ('query time' - as in my case the numeric field names are not known in advance).
Example:
Index Mappings
{
"flattened_test": {
"mappings": {
"properties": {
"host": {
"type": "flattened"
}
}
}
}
}
Sample documents
"host": {
"hostname": "bionic_1",
"name": "bionic_1",
"num_one": 1323
}
---
"host": {
"hostname": "bionic_2",
"name": "bionic_2",
"num_one": 2323
}
---
"host": {
"hostname": "bionic_3",
"name": "bionic_3",
"num_one": 3323
}
Sample Range Query
GET flattened_test/_search
{
"runtime_mappings": {
"doc['host.num_one']": {
"type": "long"
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"host.num_one": {
"gte": 4000,
"lte": 7000
}
}
}
]
}
}
}
This serves me well for the use-case at hand. And I understand the performance implications of query time runtime fields and the trade-off is acceptable in my case.
However being new to ES, wanted to validate here - if I am over-looking anything obvious or any other feedback?
Thanks,
To follow up and update on the use case in Elastic APM (https://github.com/elastic/elasticsearch/issues/61550#issuecomment-772117004):
We're not planning to use flattened. Instead, we'll use subobjects: false
at the root of the metric mappings. This will allow ingesting metrics such as connections
and connections.idle
in the same index, without causing a mapping conflict. Currently, this requires all incoming documents to be flat but the ES team is working on also supporting nested object notations in documents where subobjects are disabled in the mapping: #97972. This makes adding the subobjects: false flag backwards compatible.
I'm sure there are other valid use cases for numeric flattened fields, though, such as avoiding field explosions.
Having said that, we're also working on a new way of dealing with field explosions by ignoring fields that exceed the limit instead of rejecting documents: https://github.com/elastic/elasticsearch/pull/96235
+1 need numeric fields in flattened types to be fully supported for range queries
Pinging @elastic/es-search-foundations (Team:Search Foundations)
This issue is a spinoff of #43805 that focuses on a specific use case: supporting numeric fields in the flattened field. We've discussed this internally and agreed that it is something that we'd like to provide. This new field could be considered as the numeric version of the flattened field where all values should be parseable as numbers. The details of the implementation are still unclear but multiple ideas were shared internally:
We could reuse the framework added for the
rank_feature
query where field names could be indexed as terms and values as frequencies.We could use points with multiple dimensions and/or prefixes/suffixes to index the pair field name, value.
This issue is a placeholder to provide feedback and updates on the overall plan (supporting a fully numeric flattened field).