Closed machadoum closed 1 year ago
@machadoum updated the ticket body ^^ with telemetry instrumentation ideas.
Pinging @elastic/security-threat-hunting (Team:Threat Hunting)
Pinging @elastic/security-solution (Team: SecuritySolution)
I added tags to classify which teams are responsible for implementing the metric. We need to reach these teams and check which metrics they have implemented. And how we could add the ones they haven't implemented.
@machadoum can you please update this ticket with your findings? Thanks!
Hi @SourinPaul, This ticket is still in our backlog, but I haven't found the time to work on it. I should have some time for it after 8.7 FF.
The Kibana usage_collection plugin automatically collects Application usage telemetry. It contains the number of clicks and minutes on screen aggregated by 7, 30, and 90 days:
"appId": "security-solutions",
"viewId": "entity-analytics",
"clicks_7_days": 10,
"clicks_30_days": 20,
"clicks_90_days": 100,
"clicks_total": 140,
"minutes_on_screen_7_days": 1.5,
"minutes_on_screen_30_days": 10.0,
"minutes_on_screen_90_days": 11.5,
"minutes_on_screen_total": 32.5
We send UI counter events when the user enables or disables an ML job using Security Solutions UI (it includes the ML widget). Counter events are a key/value pair.
"key":"siem_job_enabled",
"value":1
We send UI counter events when an error happens during the job installation:
Application usage and counter events are enhanced with a set of properties such as app version
, cloud.account-id
, cluster-name
, license.type
, etc.
This extra data can be used as a filter or to plot visualizations:
Links: Dashboard with entity analytics telemetry) Usage collection plugin doc A spreadsheet summarizing security solutions telemetry
- Daily user action to filter Risk Score panels with Risk Classification filters per cluster
We can send UI counter events when users click on "Risk Classification filters". But currently, UI counters are aggregated by 7, 30, and 90 days.
- User action to launch new investigations by a logged-in user (anonymize the user)
We can send UI counter events when users click launch new investigations
. But from what I am aware, we don't have user data to aggregate by.
- User action applying 'Host/ User Names' as global filters. Normalize by a logged-in user.
We can send UI counter events when the user clicks on the Hover filter for a 'Host/ User Names' field. Should we send the event on every page? Or only when the user is on the entity analytics page? We can't filter it afterwards.
- Time to enable the feature by key features
What is a key feature? Is it the ML job?
When does the time start ticking? When the entity analytics page loads?
The following telemetries are generic and could be collected by the Application usage event for every page, not only EA. So let's create feature requests for @elastic/platform-analytics team.
- Avg. time spent on the page before key user action is taken
- Daily engaged users on the EA page (time spent > 5 mins) && 1+ key user action
What is a key user interaction?
- Daily impressions on the EA page (time spent > 1 mins & < 5 mins)
Application usage already collects page impression and screen time. Is it enough?
- Total # of unique user sessions (unique logged-in users) per hour
- Total time spent on the page by each logged-in user per session
Application usage doesn't collect user and session data. It also doesn't aggregate per hour. We should create a feature request to @elastic/platform-analytics.
- Daily count of anomaly jobs by status
- Count of anomaly results by job_id per cluster
Those sound like generic ML telemetry. Not specific to the EA page. Let's double-check if the ML team already collects this data. Otherwise, we can create a task that runs every hour/day and sends counter metrics from the backend for each installed job.
@ajosh0504 shared with me a dashboard that includes "Daily count of anomaly jobs by status" https://stack-telemetry.elastic.dev/s/machine-learning/app/dashboards#/view/39c6e0b8-8d23-572f-91f8-01f9a80d1b66?_g=(filters%3A!())
Thanks, @machadoum. I have updated the ticket body based on your research findings.
An additional questions:
User action applying 'Host/ User Names' as global filters. Normalize by a logged-in user. We can send UI counter events when the user clicks on the Hover filter for a 'Host/ User Names' field. Should we send the event on every page? Or only when the user is on the entity analytics page? We can't filter it afterwards.
Are you concerned that we cannot segment user-action counts by the kibana source if we collect on every page?
Application usage doesn't collect user and session data. It also doesn't aggregate per hour. We should create a feature request to https://github.com/orgs/elastic/teams/platform-analytics.
Do you get an ETA from the platform telemetry when such user/ session telemetry may be available? Wonder if they are already working towards this - given user assignment to cases has been introduced in the soltuion. Else I can follow up.
Note I appended a new KPI, Drill downs of anomalies
under the Feature Usage
section.
Let me correct a mistake from my side. If we use ui-counter
events, we can segment them by day and hour. We are only limited by 7-30-90 days aggregations for application-usage
events. I update the ticket description to remove this limitation.
Update:
Due to the many limitations of ui-counter
events, we decided to move in a different direction and use Event-Based Telemetry (EBT) instead.
The main difference between ui-counter
events and EBT is that EBT allows sending a series of events without the need to pre-aggregate those events on Kibana's side before sending. That capability will enable analysts to slice and dice the usage during analysis.
I have added three events to the Entity Analytics page as a POC (https://github.com/elastic/kibana/pull/152338). The data is shipped to ebt-kibana-browser
index and can be accessed on staging here: https://telemetry-v2-staging.elastic.dev/
.
I created Lens visualization) to exemplify how we can query the data.
If you explore the index, you will notice that event-specific data is stored inside the properties.*
field, and general data is available inside the context.*
field. We can extend both fields and add any properties we need. Some of the properties that are available by default are context.cloudId
, context.license_type
, context.session_id
, context.userId
, context.version
, context.viewport_height
and more.
I am moving it to "done" because we implemented all telemetry that could be collected in the Explore area. @SourinPaul Please validate they satisfy product needs.
These items were NOT implemented:
Goal: Research
To understand better the user behavior in Security Solutions, we need to add telemetry to the entity analytics page. This ticket collects a list of metrics relevant to Entity Analytics.
Appended Feb 24 @SourinPaul: I'm assuming we can collect cloud adoption telemetry from FullStory. To quantify the user value, let's prioritize collecting a few important usage matrices. This will also help drive subsequent roadmap enhancements.
Index:
Tags:
[core-telemetry] - Telemetry that should be tracked across all kibana pages [job-metric] - Job metrics that the ML team should collect
Key user actions:
Anomaly job status:
Key features:
UI assets: (as of 8.7)
EA Dashboards
Alert Triage
All Hosts
viewAll Users
viewRisk Explainability
(New UI component: ETA ~8.10)?Missing?
Feature Usage:
[x] Users prioritizing Entity Risk panels/ views with
Risk Classification
EA Dashboards
,Alert Triage
,All Hosts
,All Users
[x] Users launch
New
investigations from Entity Risk panels (anonymize the user)EA Dashboards
[x] Users apply 'Host/ User Names' as global filters from the below UI assets. Normalize by a logged-in user.
Risk Explainability component (New: TBD)
[ ] Avg. time spent on the page before the user takes a
key user action
[research]EA Dashboards
[ ] Anomaly Jobs by
Status
[job-metric]EA Dashboards
[x] Click count of anomalies (EA anomalies panel)
EA Dashboards
(Notable Anomalies)Feature Adoption:
[x] Error events from feature enablement per cluster https://github.com/elastic/kibana/pull/155233
[ ] Daily engaged users on the
EA Dashboard
[research]Key user action
) > 1 [core-telemetry][x] Count of anomalies by job_id per cluster [job-metric]
Descoped:
key features
References: