elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

Data View: cross-field metadata and their relationship to data visualization #97278

Closed monfera closed 4 weeks ago

monfera commented 3 years ago

Keywords: metadata, field, recommender, data view, shared visual attributes, datavis best practices Examples in response to Vijay's request in moving from index patterns to Data View.

Cross-field metadata

Not all metadata neatly belong to a specific field or an entire index. Sometimes it's about relationship between two or more fields within an index or even, across indices. Examples for metadata across fields, and their utility for visual exploration:

Fields whose contents relate to one another

Hierarchical relationship between fields

One field breaks down another. Examples:

It's good to know if the subunit can even be used on its own. Eg. "Paris" can be "France/Île-de-France/Paris" or "US/Texas/Paris", so, on its own, it's ambiguous, unless the City field is a unique code.

Visualizations that work well across the hierarchy:

image

Styling of hierarchical data might follow a primary breakdown, eg. also projected to color, while the deeper nodes inherit that (or fade out, like the sunburst): image

Multidimensional variables

Usually, there are several discrete (categorical or ordinal) variables associated with documents. They collectively represent slicing and dicing ability (explorability, drilldown, drill-through etc.). In a given chart, usually only one (very rarely, two) can utilize a color mapping.

Functional dependency: independent variables vs dependent variables

Knowledge or inference of which field(s) determine the value of other field(s).

Examples:

Often, exploratory interaction is about filtering or navigating in the realm of independent variables / dimensions, while the quantities and categories of dependent variables are aggregated (or in contrast, disaggregated) and visualized.

Time and space dependency

Most metrics in an index may change over time, and/or spatial dimensions where available. It's useful to default to eg. a time series view or map view (recommender) and offer suitable visualization choices, eg. lines, if the time series is reasonably continuous.

Explanatory relationship

Key and text field pair:

Visualization and data exploration impact:

Redundant metrics

Certain metrics may redundantly encode the same information (eg. same phenomenon, different unit) or may contain precomputed values (eg. elapsed time, MB, MB/s).

Physical data representation changes over time

For example, user name of a given user changes; name of country changes; or an upstream logging system gets fixed. The new values may be in another field. A Data View may make the change disappear, by abstracting over. Benefits:

Independence of metrics

If there's no established relationship among certain fields, they can be assumed independent of one another. This doesn't mean no correlation, and showing correlations is probably a good idea, eg. via scatterplot, SPLOM, parcoords. image

Shared attributes

Here, multiple fields relate to one another through common properties. This can happen across fields within the same index, or among fields that are in disparate indices.

Shared nominal types (semantic domains)

While field types are present in Elasticsearch, they represent physical domains.

For example, a part to whole ratio may be represented

A "megabytes transferred" metric may be represented

The physical type doesn't give much useful information for what transforms and visualizations may be even legitimate. Nominal (semantic) types are required for

Nominal typing may include these, and more:

Note: such typing information may eventually enable more compact representation in Elasticsearch.

Several fields that reference a shared semantic type are meaningfully related. Example: both buildings_index and roads_index have a field for occupied land area. They share a unit (eg. square meters) and they share the property of additivity. These two fields may even be linked to a common metadata descriptor (DRY principle in data modeling). Therefore, a report, visualization or data transform may safely add land areas of buildings and roads, to get summarized land occupance.

Even just the knowledge of shareed, or convertible unit is useful for dataviz, because then they can be projected to a common vertical scale.

Shared visual attributes

Due to compatible nominal types,

It's desirable that visual recommenders and defaults exploit common value=>aesthetic mapping when possible. Besides compatible nominal types, the default value=>aesthetic mapping can be associated with specific Data View fields, or even, across multiple Data Views.

Therefore, default mappings are first class entities which can be referenced by fields in Data Views (this still allows the implicit creation of mappings, if not shared among Data Views, for the user's convenience; can be made explicit and extracted when needed)

image

See also Beyond palettes

Multi-index Data Views

Sometimes data that relate to one another are not in the same index or index* group. Eg.

*A future Data View may reference multiple index (or index) entities**, with metadata in Data View associating the relationship among indices and their fields (see cross-index fields)

Derived information in Data Views

Eventually, a Data View should be able to represent an aggregation, filtering or other data transformation of its input (indices, or another, more granular Data View).

Even in this case, field level metadata is useful, per field and across fields. Because the ultimate use in visual analytics is the same, and it requires various kinds of metadata.

So, Data Views may eventually become composable. Example: different parts of the organization may need

Even if there's a single dashboard, or a set of dashboards that share a bunch of fields, it may be worth creating a common Data View for that, atop of a possibly preexisting Data View, so that theming and mappings can be shared:

image Vavaliya et al: Online Performance Assessment System for Urban Water Supply and Sanitation Services in India)

A Data View that represents data transformation actually generates metadata. For example, a grouping aggregation will yield unique rows in terms of the values in fields that are part of the grouping dimensions.

elasticmachine commented 3 years ago

Pinging @elastic/datavis (Team:DataVis)

monfera commented 3 years ago

image

monfera commented 3 years ago

Field metadata drives some of the recommendations: https://data.humdata.org/dataviz-guide/dataviz-elements/#/data-visualization/bar-charts ht @maartenzam

monfera commented 3 years ago

Related: https://github.com/elastic/kibana/issues/73152

elasticmachine commented 1 year ago

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

markov00 commented 4 weeks ago

In order to provide better transparency of priorities, issues that will not be prioritized within the next 24 months are being closed.

Tracking request in Lens general improvements ice box https://github.com/elastic/kibana/issues/184648