elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.74k stars 8.14k forks source link

[telemetry] Fix Maps telemetry with dynamic fields #38394

Closed Bamieh closed 1 year ago

Bamieh commented 5 years ago

Hello maps team :wave:

I was updating the telemetry index mapping yesterday for maps, the telemetry sent for maps has 2 dynamic fields. We do not accept dynamic fields so these fields will not be used by telemetry

"layerTypesCount": {
   "dynamic": "true",
   "properties": {}
},
"emsVectorLayersCount": {
   "dynamic": "true",
   "properties": {}
}

I was hoping to understand what gets reported under those fields so we can map them differently to avoid dynamic fields.

Context: https://github.com/elastic/telemetry/pull/108

elasticmachine commented 5 years ago

Pinging @elastic/kibana-stack-services

elasticmachine commented 5 years ago

Pinging @elastic/kibana-gis

Bamieh commented 5 years ago

After discussing this with @aaronjcaldwell @pickypg we agreed to restructure emsVectorLayersCount and layerTypesCount into an array instead of dynamic fields.

"emsVectorLayersCount": {
   "properties" {
       "key":  { "type": "keyword"},
       "min": { type: "long" },
       "max": { type: "long" },
       "avg": { type: "long" },
   }
},
"layerTypesCount": {
   "properties" {
       "key":  { "type": "keyword"},
       "min": { type: "long" },
       "max": { type: "long" },
       "avg": { type: "long" },
   }
}

We avoid dynamic fields in the telemetry mapping. Additionally frequently adding / changing fieldnames in ES is an anti-pattern.

We'd be using these fields to check overall usage, which specific layers being adopted and which layers are getting used the most.

To achieve a more granular aggregation we'll have to reindex the data, but usually the basic use-cases are enough. We can also automate the reindexing if that is needed to capture some of those aggregations.

The largest benefit is that we can add new layers and types of layers and there’s absolutely no work that needs to get done by anyone to accept it.


An action item for this is to refactor to the above data structure properties: { key, max, min, avg} for the two dynamic telemetry fields the next time we make updates.

alexfrancoeur commented 5 years ago

I just ran into this when trying to get some telemetry on our EMS usage 😄 . Any idea when we might be able to fix this? 7.3 doesn't seem to be possible at this point (but correct me if I'm wrong), what about 7.4?

kindsun commented 5 years ago

Restructuring this on the code side shouldn't be too difficult. I'm happy to coordinate my work with the telemetry team's whenever makes sense!

bmcconaghy commented 5 years ago

@aaronjcaldwell please coordinate with @Bamieh for whatever you need re: telemetry.

elasticmachine commented 4 years ago

Pinging @elastic/pulse (Team:Pulse)

Bamieh commented 4 years ago

Meeting summary about solving the dynamic: true fields issues for maps (@aaronjcaldwell @afharo @TinaHeiligers )

Overview

We will aim to find a root solution to this issue with pulse. For now we dont have a go to solution for fields that have variables provided by the users.

Since the custom layers can be added by the users we dont have a pre-defined set of all the fields to map them.

Allowing dynamic fields of any kind in telemetry is not possible.

Solution

The current proposal is to implement a 3-step strategy of discovery, exploration and mapping.

  1. (Discovery) Add an array to store the layers provided by each cluster

This will be used strictly for discovering the layers provided by the users. layer_key is a combination of ${name}__${type}.

{
  "mappings": {
    "dynamic": false,
    "properties": {
      "maps": {
        "properties": {
          "layers": {
            "properties": {
              "layer_key": {
                "type": "keyword"
              },
              "layer_name": {
                "type": "keyword"
              },
              "layer_type": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}
  1. (Exploration) Run aggregation on one of the layer_key, layer_name, or layer_type to discover the layers are provided by all our users.
{
  "size": 0,
  "aggs": {
    "layers": {
      "terms": {
        "field": "maps.layers.layer_key"
      }
    }
  }
}

The Maps team will need to use this data to decide which user-defined layer types and names to gather telemetry from and request an update to the mapping through the pulse team.

  1. (Mapping Update) Layers that seem interesting (from the exploratory phase) can be explicitly mapped under layerTypesCount and emsVectorLayersCount. This way we can control the fields that are mapped, while still keeping the dynamic: false property of telemetry.
{
  "mappings": {
    "dynamic": false,
    "properties": {
      "maps": {
        "properties": {
          "layers": {
            "properties": {
              "layer_key": {
                "type": "keyword"
              },
              "layer_name": {
                "type": "keyword"
              },
              "layer_type": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

Mapping these new fields requires the requesting team (in this case, the Maps team) to open an issue in the telemetry repo for the pulse team to handle. Although this will probably require some back and forth to map new layers to analyze them, it is the only plausible solution we could come up with for now.

Using an analyzer (for exploration of new fields)

It is worth exploring adding an analyzer for the layer_key field to be able to search on the layer_name, or type while still maintaining the relationship between the two since non-nested arrays cannot be queries in that manner.

Other explorations (dynamic_templates)

We have played around with using dynamic templates. The approach is great as it allows us to map fields that can follow a strict pattern. This does not work in the maps situation since users can be sending arbitrary layer names, hence re-creating the main issue why dynamic mapping is set to false in the first place.

{
  "mappings": {
    "dynamic": false,
    "properties": {
      "maps": {
        "type": "object",
        "dynamic": true
      }
    },
    "dynamic_templates": [
        {
          "mapsFields": {
            "path_match": "maps.count_*",
            "mapping": {
                "key": {
                  "type": "keyword"
              },
              "min": {
                  "type": "long"
              },
              "max": {
                  "type": "long"
              },
              "avg": {
                  "type": "long"
              }
            }
          }
        }
      }
    }
  ]
}
afharo commented 3 years ago

@Bamieh do you think we can close this issue now that we have the schema definitions?

Maybe @mindbat wants to provide some input regarding this piece of telemetry?

elasticmachine commented 3 years ago

Pinging @elastic/kibana-core (Team:Core)

mindbat commented 3 years ago

@afharo Perhaps with the push for moving to Telemetry-Next, we can re-open the discussion of dynamic fields for the Maps team, via an issue on the Infra repo?

@alexfrancoeur What do you think?

afharo commented 3 years ago

@mindbat sure! My point is that the current schema regarding maps provides the DYNAMIC_KEY keyword: https://github.com/elastic/kibana/blob/master/x-pack/plugins/maps/server/maps_telemetry/collectors/register.ts#L44-L50

If that's good enough for Telemetry-Next, I think we can close this issue.

alexfrancoeur commented 3 years ago

Before closing, we probably want the Maps team to weigh in. While I'm fortunate to have a history with this amazing team, I'm no longer part of it. @kmartastic @thomasneirynck thoughts on the Telemetry thread here?

kmartastic commented 3 years ago

I'm not quite sure I understand what's being discussed. Actually, I'm quite sure I don't.

What is Telemetry-Next? Is that a project? schema? Is there a solution for working with dynamic fields within Telemetry-Next? We have a growing need for capturing this data as we are adding new layer types and want to track usage.

@alexfrancoeur @thomasneirynck @afharo

afharo commented 3 years ago

Hi @kmartastic,

Telemetry-Next is a new take to the way we store the data in the Remote Telemetry Service. Allowing dynamic keys, or even array structures, if preferred. For more details, reach out on the #telemetry-next Slack channel.

The schema is how Kibana defines the "contract" of the usage reported by each collector so the Remote Telemetry Service knows what to expect and what's changed in each version.

Re this specific issue, Maps have 2 fields that are objects where the keys cannot be known beforehand. For now, we've worked around them when defining their schema by setting the key name DYNAMIC_KEY. My question is whether we think that's enough for this use case or if we want to work further into a better structure/definition.

teresaalvarezsoler commented 2 years ago

@alexfrancoeur can we close this issue?

elasticmachine commented 1 year ago

Pinging @elastic/kibana-presentation (Team:Presentation)

afharo commented 1 year ago

I'll go ahead and close this issue. From the last few comments, it seems that both teams are asking each other if we could close it 😅

Feel free to reopen if you think that any additional work is needed.