Closed makwarth closed 5 years ago
Pinging @elastic/apm-ui
Pinging @elastic/infra-logs-ui
Pinging @elastic/uptime
The Infra and Logs telemetry is bundled as "infraops" (for legacy reasons). It'd be nice to separate it out as "logs" and infra" as "infraops" is a bit confusing, especially to new comers.
Agreed. This depends on us splitting the current plugin into two (see https://github.com/elastic/kibana/issues/36680 ), as with telemetry we can't report outside of our namespace, and that's currently infraops
.
I think initially for Uptime we'd define two fields (which is what our current PR does):
stack_stats.kibana.plugins.apm.past_week.uptime.monitors.hits: <int>
stack_stats.kibana.plugins.apm.past_week.uptime.monitors.detail.hits: <int>
The only change will be the field name used and modifying the tracking logic to line up with the proposed improvements on this issue.
As https://github.com/elastic/kibana/issues/36680 has been put on hold (we won't split into separate InfraUI and LogsUI plugins) we're stuck with the infraops
namespace for the time being for both. Switching from a last_24_hours
to a past_week
interval is not affected by this.
This issue is now replaced by the implementation issue #39507
Motivation for this issue
We currently track Observability telemetry in various ways across the solution UIs, which makes the data hard to compare across solutions. This issue proposes to streamline the Observability telemetry.
How telemetry is currently implemented
Logs UI
Implemented in 6.5. Every click on the Logs UI app will initiate a request to fetch data for the active time period. Each request will increment the telemetry event counter regardless of the request response. If there has ever been more than 5 telemetry events within 24 hours in a given month per unique cluster, the cluster will get included in the telemetry count in the above table.
What the data looks like today:
Infra UI
Implemented in 6.5. Every click on the Infra UI app will initiate a request to fetch data for the active time period. Each request will increment the telemetry event counter regardless of the request response. If there has ever been more than 5 telemetry events within 24 hours in a given month per unique cluster, the cluster will get included in the telemetry count in the above table. There's a telemetry event for hosts, docker and kubernetes. As long as any of them has more than 5, the cluster is included in the count.
What the data looks like today:
APM UI
Implemented in 6.6. Every visit to the Services list page is monitored by telemetry. If there's any services in the list within the past 24 hours, the telemetry event will be set to true. This means only clusters with installed agents will be included in the APM telemetry count.
Uptime UI
None yet. Scheduled for ~7.2~ 7.3. PR: https://github.com/elastic/kibana/pull/34437
Streamlined implementation
Here's some areas of improvement and streamlining:
The telemetry data is sent up once per day per Kibana instance. That's probably why the telemetry data time range is 24 hours. However, for any data sent up on e.g. a Monday, we'd get a bunch of zeros from the weekend. Looking at a month, that's not a problem, but if want to go more granular, it is a problem. Therefore I propose we change the time range from 24 hours to 1 week.
The telemetry (besides APM) doesn't take into account the response of the queries. I don't think it's so useful to see the count of queries performed in
<plugin>
as it doesn't say much about actual adoption. For example, a team could be clicking the Logs plugin multiple times during a day without actually using the product. I propose we look at the query response instead before deciding to increment the counter. Only if there's actual data (hosts, logs, etc.) in the response, the telemetry counter should increment. This will give us telemetry data of users who definitely consumed real data in our products. We can use this same data to see if they continue to consume data in the product going forward. (Is the product valuable to them or not?). Later, we can add the new event tracking as well, so that we can tell if users are using core functionality of the products.The Infra and Logs telemetry is bundled as "infraops" (for legacy reasons). It'd be nice to separate it out as "logs" and infra" as "infraops" is a bit confusing, especially to new comers.
It's be nice to streamline the naming, e.g.
stack_stats.kibana.plugins.logs.past_week.hits
. "Hits" isn't very explicit, but since it's different per solution, we might just one to go with a common name, like "hits". We'd have to document what "hits" exactly means per product.Proposal
What the updated telemetry could look like:
It'd be great to get this out in 7.3 as that's when Uptime add telemetry.