elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.13k forks source link

Graph - grouping functionality #17866

Closed elasticmachine closed 5 months ago

elasticmachine commented 7 years ago

Original comment by @markharwood:

Grouping functions

A useful addition to our current term-linking capability is the option to spot links that represent hierarchies. Hierarchies can be found across fields (product_code and department) but also across values from a single field. The key to identifying a hierarchical link is where a term always (or nearly always) appears alongside another term which is more popular. This can be seen in this example visualization of tags from StackOverflow data:

!LINK REDACTED

The highlighted connection in this diagram is suggested as a candidate for grouping - we can roll-up the term "logstash-grok" into the "logstash" vertex. The grouping suggestion is made because logstash-grok always appears to be used in the wider context of docs talking about "logstash". We see these relationships in other data e.g. the LastFM music listeners' habits clearly shows the relationships between popular bands and the band members who have had solo careers.

Grouping is a useful function because it allows us to: 1) de-clutter the diagram 2) perform single actions on the group (blacklist, look for more connected terms..) 3) use as an accounting unit in analytics (see below)

Having grouped logstash and logstash-grok in the diagram below we use the combined terms as a single unit for the aggregations subsequently shown when we click connections e.g. here we are showing agg results for _(logstash OR logstash-grok) AND elasticsearch

!LINK REDACTED

Currently this grouping functionality is achieved using client-side functions embedded in Kibana but there are potentially two grouping functions that could be shifted to the server-side:

Graph API for automated hierarchical grouping

This would be a recursive function that identified initial candidates for grouping from single-term co-occurrences and then merges them. Having merged initial terms it would then look for further merge candidates, this time considering both the "primitive" single-term vertices and their co-occurences with the new "grouped" multi-term vertices.

Terms/Significant terms aggs support for multi-term 'include' buckets

We can use the filters aggregation to provide grouping of related terms into a single bucket for collection but significant_terms only considers the frequencies of primitive terms - not groups of terms when identifying statistical anomalies. We may want to consider a significant_term_groups agg or similar. This is akin to offering run-time synonym support - the ability to perform statistical analysis on selected groups of terms rather than individual terms.

(this issue replaces the original comment in LINK REDACTED )

elasticmachine commented 7 years ago

Original comment by @markharwood:

Flagging here another form of grouping currently in development: The LINK REDACTED, rather than spotting terms that are related through co-occurrence in the same doc (e.g. #kibana and #elasticsearch) it spots similarly-labelled terms that may have never appeared together in the same document.

Consider this example from Dallas crime reports where we can suggest crime MOs that look very similar: !LINK REDACTED

The Graph UI then allows these strongly related crime descriptions to be merged into a single grouped vertex (the Dallas crime data has 30+ varying descriptions of this crime type).

!LINK REDACTED

Having grouped the items in this way it would be useful if calls to the graph API or other parts of Kibana could treat them for accounting purposes as a single term e.g. when finding significant people with at least 3 connections to this term-group.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

elasticmachine commented 1 year ago

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

timductive commented 5 months ago

Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner.