elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.47k stars 8.04k forks source link

Beyond palettes: shared visual attributes #61977

Open monfera opened 4 years ago

monfera commented 4 years ago

Shared mapping to visual attributes

Visual attributes and pattern matching: the core of data visualization

Color, size, length, position, sharpness, intensity, orientation etc. are visual attributes (image from Stephen Few's Tapping the Power of Visual Perception):

image

These are preattentive attributes as our visual faculties can near-instantaneously match patterns. There are also derived visual attributes such as speed of motion (eg. to indicate advance of time) or animated jitter (to convey uncertainty).

Some attributes are subtle, yet can be key to successful visualization, eg.

For example, let's identify occurrences of 3: the image on the right makes it easy by assigning a distinct intensity to the item (images from Katherine Hepworth's site):

image

Data visualization is about helping people find and share patterns in data by mapping the data and its derivatives primarily to visual attributes, optimizing for a target balance of aspects (quick recognition; recall; precision; enjoyment; impact; effort of creation; shareability etc.) while minimizing constraint violations (eg. lack of readability or accessibility).

Role

A visual attribute eg. color may serve various roles, eg.

Unit of data visualization

An individual chart is often but a part of a data visualization effort. A key level for data visualization is the cohesive product or experience. Examples:

It is therefore impossible to solve for the dataviz problem (goals and constraints) at the level of individual charts or any other parts; the whole context of the above forms, readers and circumstances need to be considered. Further down we'll use the dashboard as a proxy for all of these forms.

For example, this dashboard uses 5 colors consistently on all its constituent charts, amortizing the reader's cost of temporarily memorizing the color attribute mapping (Vavaliya et al: Online Performance Assessment System for Urban Water Supply and Sanitation Services in India):

image

The consequence is that the assignment of attribute mappings for shared dimensions, measures, metadata etc. need to be handled at least at the dashboard level (reminder: dashboard is just a shorthand for all the things listed above).

While color is a front and center example, the other visual attributes are to be shared with the same zeal:

Style guide theme vs. attribute mapping

Themes and style guides are commonly made and used in visual design, UX design and UI implementation. They bring about visual consistency and corporate likeness for related visual elements and affordances, with the emphasis on the structural, scenegraph aspect.

In contrast, visual attribute mapping deals with cohesive projection of semantic constituents, ie. data content such as dimensions, measures and metadata.

Legends

Sharing attribute mappings on a dashboard has benefits beyond preattentively assisting the process of relating different projections of like data, and helping the reader keep some mappings in the (usually short-term) memory. These are:

This is also a great example for sharing visual attributes across diverse tools and projections of visualization, eg. geospatial or temporal.

Tooltips

Sharing axes leads to the potential for axis oriented tooltips to show values in multiple charts together (source: 538): tooltip2

It is therefore useful to correlate the user intent of pointing at something with the valid, shared projections on the dashboard, sometimes even if the screen projections aren't sharing axis scale and offset, or they're distant. This assumes the sharing of spatial attributes (and their inverse mapping to data) of the pointing intent.

Tooltips may convey a single number or a few numbers (eg. series name, data X, Y) in which case it's like a minuscule table (or one row / one column table) but it can be elaborate, therefore it helps reuse if we think of all tooltips as visualizations that are linked via certain data fields. Example for geo+temporal combination (NYT by Adam Pearce and team, ht tweet by Maarten Lambrecths): image

Annotations

Not just primary data ink but annotations eg. reference line overlays, outlier markings (eg. via salient color) can share visual attributes. Eg. reference lines can cut across multiple charts if some of their spatial projections are identical.

Accessibility

A key constraint in data visualization is the diverse ability of people to distinguish colors in various wavelengths. Many color palettes take into account discernibility by those with monochromatic vision. Not all data visualization tasks are of the same consequence; in healthcare or industrial monitoring, all ambiguities must be resolved, while a café may well show its fun, colorful coffee popularity dashboard with less regard for readability and more for evocative colors.

Inherent or acquired meaning

Often, it helps the viewer link shape or color with underlying data if there's a physical or custom based correspondence. Visualizing the turnover of avocado, strawberry and banana on a dark background may use green, red and yellow, respectively. Organizations may evolve their own color coding. Police, or US democrats are blue. Such relationships between entities and colors are precarious and do not scale as the number of entities goes up, yet it's important to provide the ability of stable mapping from categories to color for when it matters. While it's not possible to mentally link more than about a dozen distinct colors between data ink and legend, our focus is also limited to a low number of key categories at a time (while the rest can be subdued gray).

As the set of values the user may want to visualize may vary over time, even within the same dimension (eg. product, which can be numerous), a stable category to color assignment may not work. In this case, there's essentially random picking from a categorical color palette, but

Social context

We take for granted certain color assignments. While the red-yellow-green traffic light colors seem fairly universal, here's the same March 16 drop shown in different parts of the world: image Certain colors also carry heavy emotional meaning and may be preferred or shunned.

Saliency

Attention grabbing and keeping focus is one of the roles of visual attributes (mostly color, but also, line width, Z-order, blut/fade, or plain make invisible or move to the bottom of a small multiples cluster).

Consider the judicious use of color for highlight here (by Lars Schubert / Graphomate pin): image

In contrast, going overboard with color will result in confusion even if the colors otherwise have clear and shared association: image

The user has no orientation as to what to look first, when initially facing this dashboard.

Configuration

Providing color wheels for users is very useful, eg. to let the user maintain color assignment between categories and colors.

However, when building a visualization or dashboard, a color picker is a last resort, an escape hatch, potentially indicating that higher layers of color assignment abstractions had not been put in place (eg. TSVB).

Issue and paper links

Takeaways

Grayscale example (original, in color, by Nate Silver et al at 538) image The user doesn't simply switch to a grayscale palette for a chart; what happens is that

While all these sound a bit vague, the user actually performs tangible steps:

elasticmachine commented 4 years ago

Pinging @elastic/kibana-design (Team:Design)

nreese commented 4 years ago

related to https://github.com/elastic/kibana/issues/43697. Maps has the same need. It would be great if visualization attributes could be defined at the index-pattern level so out of the box there is consistency when visualizing on a data dimension.

cchaos commented 4 years ago

I agree with all of this. So how do we get started? 😺

nreese commented 4 years ago

We have the UIs and data structures in Maps to define custom color palettes for categorical fields and numeric fields. Maybe we could come up with a plan to move this to index patterns so users can define color styling for field values in a single place.

cchaos commented 4 years ago

Maybe we could come up with a plan to move this to index patterns

One thing to note on that suggestion is that users can still get very confused about what index patterns are. We would need to ensure a good flow from application to index pattern back to application make sure the know it's a global setting. Having them be indice-based, is a good first step, but I do think having dashboard/canvas (or whatever presentation mode) wide settings is also necessary and would probably utilize a similar UI.

monfera commented 4 years ago

As much of these configurations as possible should be first-class entities that live on their own. It'd be possible to make a reference to a mapping (Cartesian scale, semantic palette, marker shape mapping and whatever we end up with) from within an index pattern, maps, dashboard, canvas or even a specific map, chart or chart feature if needed. The reason is mostly, low coupling, and there is also a somewhat dormant work thread about killing index patterns. In fact, discussing that, one of the takeaways was that there need to be a place for such mappings so work on attribute mapping might help the rethinking/deconstruction of index patterns. For now, it's still a very useful place from which the user could point to default mappings, possibly overruled at the dashboard level.

Many of the visual attribute mappings hinge on the cardinality or range of actual, runtime data; for example, how many distinct categories need to be visualized depends on data (eg. from the index pattern) but also on the dashboard, which shows a subset of the index data, what with time and other filters (thogh there's the aspect of color assignment stability - there are tradeoffs between forever constant scale/assignment and scale/assignment optimized only for the currently viewed data).

The deconstruction of index patterns referenced views (from point 3 here), something like views in SQL in that they can build on one another and can involve data restriction (subsetting) as well, which is key for stable color assignments. For example, a certain aspect of index patterns can be seen as such; as well as the dashboard's current selection, but there could be intermediate "views", eg. further restricting columns in current index patterns, and adding calculated fields (and color etc. mappings) but multiple dashboards could feed from one such view. This would make the maintenance of multiple, related dashboards - maybe grouped thematically - easier, as there can be an interim layer between index patterns and dashboards.

Also, current index patterns already have some related mapping functionality, eg. field formatters, and the assignment of human readable names to codes in the index (eg. ISO country code in the index mapped to country name in English). The ISO->English mapping is not inherently index pattern driven; several index patterns may benefit from such a mapping. So, in the future, the code-to-name mapping would be a first-class object, and the code-to-color (or other visual aesthetics) mapping would be a first-class object too

nreese commented 4 years ago

Many of the visual attribute mappings hinge on the cardinality or range of actual, runtime data; for example, how many distinct categories need to be visualized depends on data (eg. from the index pattern) but also on the dashboard, which shows a subset of the index data, what with time and other filters (thogh there's the aspect of color assignment stability - there are tradeoffs between forever constant scale/assignment and scale/assignment optimized only for the currently viewed data).

For the maps application we optimized on consistency. The application only re-fetches numerical range and top terms when the time range changes. Queries, filters, and current viewable area changes do not trigger any type of new metadata fetch.

monfera commented 4 years ago

Here's just a development efficiency concern - pondering possible implementation places and dependency relations:

wylieconlon commented 4 years ago

We already have a service in Kibana which creates a layer of consistency in colors and themes used by different visualizations, such as in a dashboard. The scope of the service is limited, as is the current palette that we have, but its existence gives us a place to focus on. The way the service works is really simple: all unique labels get a unique color, with colors derived from a "seed" palette using hue shifting.

As an example to focus on, I wanted to come up with a sample dashboard that uses a high number of unique labels:

Screenshot 2020-04-01 18 41 06

If we did something simple like change the palette to the EUI colorblind palette, the dashboard is still hard to understand:

Screenshot 2020-04-01 18 38 03

What happens if we stop using color for categories at all? Would the dashboard be less useful?

Screenshot 2020-04-01 18 57 28

After going through this exercise, I found myself thinking more clearly about ways we can make progress on this problem in the short-to-medium term. Specifically:

nreese commented 4 years ago

We already have a service in Kibana which creates a layer of consistency in colors and themes used by different visualizations, such as in a dashboard. The scope of the service is limited, as is the current palette that we have, but its existence gives us a place to focus on. The way the service works is really simple: all unique labels get a unique color, with colors derived from a "seed" palette using hue shifting.

The existing service is very limited in that it does not allow users to specify the color for categories. And the service does not grab the top terms. It treats each new category as a new color. This can generated too many categories when there is high carnality. Its better if the server grabs the top X categories and places the remaining in an Other category. For example, the maps app grabs the top 9 terms for a field. Anything outside of this is given a single color so the number of categories is controlled.

wylieconlon commented 4 years ago

@nreese exactly, I think you're identifying that we need to change the strategy for assigning colors. It is definitely a problem that we are using too many colors, but I don't think this means that we should not use the color service. We should improve it.

monfera commented 4 years ago

elastic-charts links: the color/spectrum ticket by @cchaos that predates this issue has a super useful mock gif that shows different color assignment strategies, and Caroline's issue is also linked by Handling color by @markov00, also predating Beyond palettes

monfera commented 3 years ago

Adding a couple of more references, focusing on color:

Danielle Szafir: Modeling Color Difference for Visualization Design image image

An Engineering Model for Color Difference as a Function of Size image

Measuring the Separability of Shape, Size, and Color in Scatterplots

Mapping Color to Meaning in Colormap Data Visualizations image

Selecting Semantically-Resonant Colors for Data Visualization (UW Interactive Data Lab) image

monfera commented 2 years ago

New article by @emeeks: Data visualization has a taxonomy problem.

Charts are a bad unit of measurement of data visualization. If we think of charts as species of data visualization, we use the wrong metaphor. A stacked bar chart isn’t a species that evolved from a bar chart. Instead, a stacked bar chart is a mix of numerical and hierarchical visualization of the components. Let’s change the metaphor from species to food dishes. No one thinks a carrot cake evolved from a chocolate cake, they are simply two different offerings in the category of desserts.

I mostly agree, though think that often, chart types can be seen as relating to, or culturally, implementationally or projection-wise deriving from one another, often through multiple, alternative paths, and often through other "nodes" that can in and of themselves thought of as legit chart types. For example, stacked bar charts, esp. the relative / % based variety, also share strong bonds with partition charts such as mosaic plots, treemaps, pie charts

ryankeairns commented 2 years ago

cc:/ @gvnmagni you might want to Subscribe to this long running thread :)

gvnmagni commented 2 years ago

This is very interesting! In addition to the product improvements that we can make taking all of this into considerations, what I can see here is a great starting point for proper guidelines. There is a ton of material here that could be useful for our designers and for our users in order to understand what to do when dealing with charts, dashboards and so on.

Thank you @ryankeairns for pointing this to me!

monfera commented 2 years ago

This is a convincing paper that argues against the tyranny of going with the "most accurately readable" projections Why Shouldn’t All Charts Be Scatter Plots? Beyond Precision-Driven Visualizations

image

It partially absolves pie charts, elevates heatmaps and possibly other spatial-like projections that prioritize overall pattern at the expense of quantitative readability or comparability of individual numbers. A slight concern is the low count of examples (only two visual examples are in the paper). I also feel that the paper partially attacks a strawman: readability (eg. of position vs. color/intensity) in other literature is often described in isolation (and this paper's abstract quotes "comparing individual values"); on a real dashboard/report/small multiples etc. there are a lot of confounding factors, such as how we perceive things when there's a lot of them. I don't think anyone thinks that the positions/points (top left) are generally superior to intensities/heatmap (bottom right):

image

And I think very few people ever took literally, or as a singular, absolute goal/constraint, the data-to-ink ratio. It was due reaction in an age when garish charts, vacant embellishments, 3D bar charts etc. swept over the industry.

Also, quantitative readability may be a design goal, or, depending on circumstances, even one of the most important goals (though "most important" would degenerate the visualization into a table); in practice, as with all design tasks, there's a nonlinear blend of hard to quantify goals, and innumerable constraints apply too.

A real visualization use case may involve a heatmap, showing patterns well, but with poorer individual quantitative comparability, accompanied by tooltip, subordinate charts, drilldown or navigation to other charts for more quantitative analysis—if needed.

The paper is also valuable as a well organized set of references surrounding the topic of readability, data to ink ratio, chartjunk, memorability etc.