elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.67k stars 8.23k forks source link

[Observability] Implement telemetry via UI Metric plugin #40438

Closed jasonrhodes closed 5 years ago

jasonrhodes commented 5 years ago

Another thrilling round of "understanding our telemetry"! Coming from the discussion that started in elastic/kibana#39507 and earlier, I've since discovered the ui_metric plugin that allows Kibana apps to log on-demand telemetry metrics without having to track them between telemetry fetches.

Telemetry is pull, event logging is push

The telemetry server pulls data from Kibana on a regular interval. This means that when events happen in the browser, you can't push that data to the telemetry server, you have to store it somewhere so that when the pull happens, your fetch method can dig it up and return it.

APM does this via its own saved object, while infra/logs and uptime both keep track of time-bucketed data in their Kibana server memory. This adds a lot of complexity to the telemetry tracking code we maintain in observability for only a little gain over using UI Metric.

UI Metric

The UI Metric plugin solves this by providing a standard way to store event data that will be collected and sent to telemetry whenever the telemetry server asks for it.

Proof of concept

Using trackUiMetric() from the UI Metric plugin, we can easily do some simple "page visit" style telemetry tracking with very little code. I got this simple proof of concept running in less than 30 minutes:

import { useEffect } from 'react';
import { trackUiMetric } from '../../../../../../src/legacy/core_plugins/ui_metric/public';

interface Props {
  app: 'infra_metrics' | 'infra_logs' | 'apm' | 'uptime';
  path: string;
  delay?: number;
}

export function useTrackVisit({ app, path, delay = 0 }: Props) {
  useEffect(() => {
    const prefix = delay ? `visit_delay_${delay}ms` : 'visit';
    const id = setTimeout(() => trackUiMetric(app, `${prefix}__${path}`), delay);
    return () => clearTimeout(id);
  }, []);
}

and then in 2 different page components (outer wrapper for a route/page), I did the following:

// in the metrics explorer page
useTrackVisit({ app: 'infra_metrics', path: 'metrics_explorer' });
useTrackVisit({ app: 'infra_metrics', path: 'metrics_explorer', delay: 15000 });

// in the inventory page
useTrackVisit({ app: 'infra_metrics', path: 'metrics_explorer' });
useTrackVisit({ app: 'infra_metrics', path: 'metrics_explorer', delay: 15000 });

With that much code, I jumped in the UI and did:

  1. Loaded the metrics explorer and immediately navigated away to the inventory page
  2. Stayed on the inventory page for >15s
  3. Clicked back to the metrics explorer page and stayed for >15s

That produced the following telemetry:

Screen Shot 2019-07-05 at 9 03 09 AM

Limitations

The only downside to doing our telemetry this way is that we lose the ability to easily attach additional metadata to an event. For instance, APM currently tracks whether the user has any services at all. This would still be easy by simply creating another hook/function that only tracked a metric if the request returned with service data. APM also tracks how many agents/services a user has, and this would no longer be possible without creating separate metrics for each service and then aggregating them later.

Things we can easily track now across all observability solutions:

Things I think we could somewhat easily add in the near future:

elasticmachine commented 5 years ago

Pinging @elastic/infra-logs-ui

sorenlouv commented 5 years ago

For instance, APM currently tracks whether the user has any services at all. This would still be easy by simply creating another hook/function that only tracked a metric if the request returned with service data. APM also tracks how many agents/services a user has, and this would no longer be possible without creating separate metrics for each service and then aggregating them later.

We currently track the number of services per agent (this is probably what you mean, just making sure we are on the same page). This is very useful to track the adoption of agents. It makes it possible to query for different stack configurations like "how many users have both java, nide and dot.net services":

stack_stats.kibana.plugins.apm.services_per_agent.java > 0 and 
stack_stats.kibana.plugins.apm.services_per_agent.nodejs > 0 and 
stack_stats.kibana.plugins.apm.services_per_agent.dotnet > 0
weltenwort commented 5 years ago

It's great that Kibana offers a way to track user actions. From the name I assume that the ui_metric service only runs on the browser side, though. I'm sure we can come up with some useful metrics based on that. As soon as we have any scheduled tasks (or similar) on the server side we might run into problems reporting telemetry about those.