monfera commented 4 years ago

Shared mapping to visual attributes

Visual attributes and pattern matching: the core of data visualization

Color, size, length, position, sharpness, intensity, orientation etc. are visual attributes (image from Stephen Few's Tapping the Power of Visual Perception):

These are preattentive attributes as our visual faculties can near-instantaneously match patterns. There are also derived visual attributes such as speed of motion (eg. to indicate advance of time) or animated jitter (to convey uncertainty).

Some attributes are subtle, yet can be key to successful visualization, eg.

Z-order (painter's algo) - if there can be visual overlap, usually the focal one should be on the top
blur, fade, turn into translucent - useful for deemphasizing some entities
grouping and enclosure - to convey cohesion

For example, let's identify occurrences of 3: the image on the right makes it easy by assigning a distinct intensity to the item (images from Katherine Hepworth's site):

Data visualization is about helping people find and share patterns in data by mapping the data and its derivatives primarily to visual attributes, optimizing for a target balance of aspects (quick recognition; recall; precision; enjoyment; impact; effort of creation; shareability etc.) while minimizing constraint violations (eg. lack of readability or accessibility).

Role

A visual attribute eg. color may serve various roles, eg.

show magnitude (eg. darker or more saturated is more) and sign (eg. negative is red) where the goal is to show perceptually uniform color distances in lockstep with the measure
discern entities (via their corresponding markers / lines), eg. showing distinct categories, where the goal is to maximize perceptual distance among all
bring our attention to key entities, eg. highlight my manufacturing plant in salient color while also showing the other plants in subdued colors eg. light grey
convey meaning via a standard or customary association between entity and color, eg. red indicating a failed process

Unit of data visualization

An individual chart is often but a part of a data visualization effort. A key level for data visualization is the cohesive product or experience. Examples:

dashboard, dashboard set
presentation, slide show
notebook (scientific etc.)
report, document (eg. financial)
journal article (printed and/or online)
scrollytelling
exploratory data analysis (EDA) interface
data art collage
cartoon, video feature
any or all of these, governed under a data style guide (visual attribute mapping), eg. a journal ensuring consistent country or party color coding over the years, to help repeat readers.

It is therefore impossible to solve for the dataviz problem (goals and constraints) at the level of individual charts or any other parts; the whole context of the above forms, readers and circumstances need to be considered. Further down we'll use the dashboard as a proxy for all of these forms.

For example, this dashboard uses 5 colors consistently on all its constituent charts, amortizing the reader's cost of temporarily memorizing the color attribute mapping (Vavaliya et al: Online Performance Assessment System for Urban Water Supply and Sanitation Services in India):

The consequence is that the assignment of attribute mappings for shared dimensions, measures, metadata etc. need to be handled at least at the dashboard level (reminder: dashboard is just a shorthand for all the things listed above).

While color is a front and center example, the other visual attributes are to be shared with the same zeal:

shared lengths is also common; 99% of small multiple charts share screenspace range (dimensions) and all that usefully can, should also share the mapping, via shared axis scale and offset - 4 out of 5 charts above are small multiples, and within each, the scales are shared; while the concept of small multiples might make us say, "hey a small multiple is but a chart type", there are arbitrary situations where it makes a ton of sense to place disparate viz side by side while sharing one or more axes; a common examples are marginal scatterplot, marginal heatmap (source: Plotly Express):
shared marker shapes
shared area sizes, if possible, among areal charts (pie, treemap, sunburst) of common measures
shared color intensity, font type and saliency etc. for related things

Style guide theme vs. attribute mapping

Themes and style guides are commonly made and used in visual design, UX design and UI implementation. They bring about visual consistency and corporate likeness for related visual elements and affordances, with the emphasis on the structural, scenegraph aspect.

In contrast, visual attribute mapping deals with cohesive projection of semantic constituents, ie. data content such as dimensions, measures and metadata.

Legends

Sharing attribute mappings on a dashboard has benefits beyond preattentively assisting the process of relating different projections of like data, and helping the reader keep some mappings in the (usually short-term) memory. These are:

consistent mapping needs a smaller area for legends, because they're not per chart, freeing up space to increase other aspects of readability (eg. larger font size, more granular views or more space for annotations) very often, the legend can do double-duty as a primary visualization, eg. a coloful horizontal bar chart performing the role of the color legend too (source: NatGeo, through Andy Kirk)

This is also a great example for sharing visual attributes across diverse tools and projections of visualization, eg. geospatial or temporal.

Tooltips

Sharing axes leads to the potential for axis oriented tooltips to show values in multiple charts together (source: 538): tooltip2

It is therefore useful to correlate the user intent of pointing at something with the valid, shared projections on the dashboard, sometimes even if the screen projections aren't sharing axis scale and offset, or they're distant. This assumes the sharing of spatial attributes (and their inverse mapping to data) of the pointing intent.

Tooltips may convey a single number or a few numbers (eg. series name, data X, Y) in which case it's like a minuscule table (or one row / one column table) but it can be elaborate, therefore it helps reuse if we think of all tooltips as visualizations that are linked via certain data fields. Example for geo+temporal combination (NYT by Adam Pearce and team, ht tweet by Maarten Lambrecths):

Annotations

Not just primary data ink but annotations eg. reference line overlays, outlier markings (eg. via salient color) can share visual attributes. Eg. reference lines can cut across multiple charts if some of their spatial projections are identical.

Accessibility

A key constraint in data visualization is the diverse ability of people to distinguish colors in various wavelengths. Many color palettes take into account discernibility by those with monochromatic vision. Not all data visualization tasks are of the same consequence; in healthcare or industrial monitoring, all ambiguities must be resolved, while a café may well show its fun, colorful coffee popularity dashboard with less regard for readability and more for evocative colors.

Inherent or acquired meaning

Often, it helps the viewer link shape or color with underlying data if there's a physical or custom based correspondence. Visualizing the turnover of avocado, strawberry and banana on a dark background may use green, red and yellow, respectively. Organizations may evolve their own color coding. Police, or US democrats are blue. Such relationships between entities and colors are precarious and do not scale as the number of entities goes up, yet it's important to provide the ability of stable mapping from categories to color for when it matters. While it's not possible to mentally link more than about a dozen distinct colors between data ink and legend, our focus is also limited to a low number of key categories at a time (while the rest can be subdued gray).

As the set of values the user may want to visualize may vary over time, even within the same dimension (eg. product, which can be numerous), a stable category to color assignment may not work. In this case, there's essentially random picking from a categorical color palette, but

the reader will be annoyed if, upon revisiting the dashboard, the color assignments change randomly
and will be annoyed too if, in the name of semi-permanent color assignments, the report will reuse the same color for multiple things
the number of (potentially, or currently) visualized categories is useful to know, ie. don't pick just 3 colors from a qualitative palette of 20 colors because it means that the color discernibility will suffer while some of the color/intensity space goes unused
there are color generation methods for hundreds of distinct colors, maximized for perceptual distance, if the color assignment can be random, but needs to stay constant over time, even with differing value subsets to visualize

Social context

We take for granted certain color assignments. While the red-yellow-green traffic light colors seem fairly universal, here's the same March 16 drop shown in different parts of the world: Certain colors also carry heavy emotional meaning and may be preferred or shunned.

Saliency

Attention grabbing and keeping focus is one of the roles of visual attributes (mostly color, but also, line width, Z-order, blut/fade, or plain make invisible or move to the bottom of a small multiples cluster).

Consider the judicious use of color for highlight here (by Lars Schubert / Graphomate pin):

In contrast, going overboard with color will result in confusion even if the colors otherwise have clear and shared association:

The user has no orientation as to what to look first, when initially facing this dashboard.

Configuration

Providing color wheels for users is very useful, eg. to let the user maintain color assignment between categories and colors.

However, when building a visualization or dashboard, a color picker is a last resort, an escape hatch, potentially indicating that higher layers of color assignment abstractions had not been put in place (eg. TSVB).

it is hard for the average reader to pick pleasing, coherent, accessible colors (though a good picker may help a ton, by offering premade palettes and color runs)
the color assignment work will be lost, or it requires error prone manual labor to duplicate and maintain
the user will end up with non-cohesive colors
in particular, letting the user arbitrarily pick a background color pulls the rug from underneath higher color mapping methods, because it's nigh impossible to offer or pick color scales for data ink, axes, text etc. that work on arbitrary background colors
the maker's unlimited color picking freedom will usually not be appreciated by the audience - months of research and development go into single color palettes

Issue and paper links

Caroline's issue in elastic-charts on the various ways of color assignment
Lisa Charlotte Rost's question, with references, on using lightness
on a class of semantic colors (Selecting semantically resonant colors for data visualization), though the paper goes much deeper than going for the low hanging fruit
Affective color in visualization
On using color for emphasis (Color in Data Visualization: Less How, More Why)
On magenta as a good highlight color (see discussion below https://www.perceptualedge.com/blog/?p=1466)
How color determines what we see ... to be continued

Takeaways

it's not the single chart that's a most useful future unit of a dashboard - it's the common, shared mappings from data to visual attributes that lead to cohesive visualizations
attribute mappings should therefore be first class citizens, referenceable by diverse charts, maps etc. and which repository defines them is an implementation question
it's not just color: it's all kinds of scales: the ones underlying Cartesian axes, for example - the architecture should be generic to allow diverse aesthetic channels, yet the color is the low hanging fruit, with Cartesian axes the second
it's related to themes / style guides as all these deal with form and color, yet it's a distinct concern
the mappings should be maintainable by the user in Kibana, and mappings should be assignable (and swappable) with dashboard
work needs to be invested in identifying and making available color palettes with various good properties
the different types of abstractions (eg. assignment of a color to a salient data element) needs work too, along mapping categories such as
- role based assignment
- saliency color map: zero, one, or a few entities get salient colors, while the rest of the entities get subdued color; example (John Burn-Murdoch by FT):
- alert, exception, failure based color mapping: similar to the above, where it's not the entity that's color mapped, but the exception event, often a certain level of a measure in a time series
- quantitative: assignment based on measure value; also shared across the dashboard or beyond
- navigational highlight: the user may choose to focus on some entiti(es) in an ad-hoc manner, eg. in a presentation of findings, so it's useful to have shareable visual attribute mappings for interactions such as hover, box or lasso select
- manual, persisted assignment: the user associates colors with certain entities, eg. product category, event type, server cluster - then visualizations of those entities default to these mappings (can be overruled)
- choice and recommendation of color palettes based on intent: do you want to make unique color coding for all data, or emphasize the most important things?
- the determination of the actually used color palette needs to depend on theme, eg. a dark background calls for different context and focus colors than a white background
the viewer be able to choose alternative, semantics-preserving assignments, eg. switch the entire dashboard to monochromatic colors by a colorblind viewer (it's not the same as just originally using safe colors); a multinational company can switch a dashboard or report format for local sensibilities; the same dashboard can be used to facilitate analytical, value reading accuracy (discern many entities) but also to present (direct the viewer's attention to highlights)

Grayscale example (original, in color, by Nate Silver et al at 538) The user doesn't simply switch to a grayscale palette for a chart; what happens is that

the user indicates the preference for grayscale (eg. due to colorblindness, or for grayscale printing)
the visual attribute mappings change in line with the semantics: for example, a "focus vs context" option was chosen - appropriate for a visualization like that, given the context in the article - so now, instead of mapping the focus to red, as in the original 538 piece, it gets mapped to a heavy, dark grey
it may be the case that not just the color but the line thickness changes for the salient line, to emphasize its importance

While all these sound a bit vague, the user actually performs tangible steps:

when creating a dashboard, the user can pick a "semantic palette" or "autopalette" instead of a fixed color palette, eg. "focus+context" or "show all values differently" or "good-unremarkable-bad"; it's then up to the system to pick the actual colors
this assignment is consistent in the dashboard, so if there are line charts and treemaps depicting the same entities, then they'd map to the same colors
the color of the data ink could automatically change for any given reason, eg.
- indicating colorblindness by the user or printing a dashboard on a mono printer
- localization in another country where eg. good vs bad, or gain vs loss has a different mapping
- switch between analytic mode (small, yet readable fonts, fine lines and ticks, dense grids for value readability) vs. presentation highlight (few ticks, no grids, corporate fonts, theme driven aesthetics)
- presenting in dark mode
- increasing device night mode readability (eg. devices filter out blue; gotta compensate)
- selecting an alternative theme (eg. from "financial" to "sci-fi")
- redesign of qualitative color palette by the organization
- dashboard resizing eg. leading to the loss of a color legend due to lack of space (at which point, distinct colors don't convey as much)
besides the semantic palettes, the user could still descend to more direct levels if needed, but then it couldn't be responsive to the above factors

elasticmachine commented 4 years ago

Pinging @elastic/kibana-design (Team:Design)

nreese commented 4 years ago

related to https://github.com/elastic/kibana/issues/43697. Maps has the same need. It would be great if visualization attributes could be defined at the index-pattern level so out of the box there is consistency when visualizing on a data dimension.

cchaos commented 4 years ago

I agree with all of this. So how do we get started? 😺

nreese commented 4 years ago

We have the UIs and data structures in Maps to define custom color palettes for categorical fields and numeric fields. Maybe we could come up with a plan to move this to index patterns so users can define color styling for field values in a single place.

cchaos commented 4 years ago

Maybe we could come up with a plan to move this to index patterns

One thing to note on that suggestion is that users can still get very confused about what index patterns are. We would need to ensure a good flow from application to index pattern back to application make sure the know it's a global setting. Having them be indice-based, is a good first step, but I do think having dashboard/canvas (or whatever presentation mode) wide settings is also necessary and would probably utilize a similar UI.

monfera commented 4 years ago

As much of these configurations as possible should be first-class entities that live on their own. It'd be possible to make a reference to a mapping (Cartesian scale, semantic palette, marker shape mapping and whatever we end up with) from within an index pattern, maps, dashboard, canvas or even a specific map, chart or chart feature if needed. The reason is mostly, low coupling, and there is also a somewhat dormant work thread about killing index patterns. In fact, discussing that, one of the takeaways was that there need to be a place for such mappings so work on attribute mapping might help the rethinking/deconstruction of index patterns. For now, it's still a very useful place from which the user could point to default mappings, possibly overruled at the dashboard level.

Many of the visual attribute mappings hinge on the cardinality or range of actual, runtime data; for example, how many distinct categories need to be visualized depends on data (eg. from the index pattern) but also on the dashboard, which shows a subset of the index data, what with time and other filters (thogh there's the aspect of color assignment stability - there are tradeoffs between forever constant scale/assignment and scale/assignment optimized only for the currently viewed data).

The deconstruction of index patterns referenced views (from point 3 here), something like views in SQL in that they can build on one another and can involve data restriction (subsetting) as well, which is key for stable color assignments. For example, a certain aspect of index patterns can be seen as such; as well as the dashboard's current selection, but there could be intermediate "views", eg. further restricting columns in current index patterns, and adding calculated fields (and color etc. mappings) but multiple dashboards could feed from one such view. This would make the maintenance of multiple, related dashboards - maybe grouped thematically - easier, as there can be an interim layer between index patterns and dashboards.

Also, current index patterns already have some related mapping functionality, eg. field formatters, and the assignment of human readable names to codes in the index (eg. ISO country code in the index mapped to country name in English). The ISO->English mapping is not inherently index pattern driven; several index patterns may benefit from such a mapping. So, in the future, the code-to-name mapping would be a first-class object, and the code-to-color (or other visual aesthetics) mapping would be a first-class object too

nreese commented 4 years ago

Many of the visual attribute mappings hinge on the cardinality or range of actual, runtime data; for example, how many distinct categories need to be visualized depends on data (eg. from the index pattern) but also on the dashboard, which shows a subset of the index data, what with time and other filters (thogh there's the aspect of color assignment stability - there are tradeoffs between forever constant scale/assignment and scale/assignment optimized only for the currently viewed data).

For the maps application we optimized on consistency. The application only re-fetches numerical range and top terms when the time range changes. Queries, filters, and current viewable area changes do not trigger any type of new metadata fetch.

monfera commented 4 years ago

Here's just a development efficiency concern - pondering possible implementation places and dependency relations:

the UI / maintenance part feels like eui as well as kibana repo area (though could initially be developed as a separate page/repo until its features stabilize, as it's slower to develop inside a large repo)
the actual attribute mappings are basically scales, some of them free (ie. categorical values are not yet assigned to specific colors, or measure ranges not yet bound with color gradients, or screenspace axis lengths) while some of them may be partially applied - therefore it feels right, and efficient to keep/put these in a more focused repo such as elastic-charts, possibly with a broadening collaborators eg. some Kibana and Geo maintainers
the coupling of the maintenance UI, the attribute mappings and the place of application needs to happen in Kibana
it'd also be possible to further evolve visual attribute mappings (scales etc.) as non-rendering components in a scales repo or similar, on which elastic-charts, maps and Kibana visualizations could depend - eui needs to be an integral part of the landscape, as not just color palettes, but eg. font attributes and other aesthetic channels may need to be depended on or shared; as an example, variable (weight) fonts are sometimes used to convey importance or magnitude
there should not be separation of attribute mappings or scales based on the place of dependency, eg. maps and charts are equal with much sharing

wylieconlon commented 4 years ago

We already have a service in Kibana which creates a layer of consistency in colors and themes used by different visualizations, such as in a dashboard. The scope of the service is limited, as is the current palette that we have, but its existence gives us a place to focus on. The way the service works is really simple: all unique labels get a unique color, with colors derived from a "seed" palette using hue shifting.

As an example to focus on, I wanted to come up with a sample dashboard that uses a high number of unique labels:

If we did something simple like change the palette to the EUI colorblind palette, the dashboard is still hard to understand:

What happens if we stop using color for categories at all? Would the dashboard be less useful?

After going through this exercise, I found myself thinking more clearly about ways we can make progress on this problem in the short-to-medium term. Specifically:

We should be looking at specific examples using Kibana
Defining the "unit of visualization" is important, but one unit I haven't heard mentioned yet is Kibana as the unit. The existing color service is a singleton which can be used by any app, including Maps.
Because we already have an existing color service, let's start defining improvements to this service. This is the way we can make short-term progress.

nreese commented 4 years ago

We already have a service in Kibana which creates a layer of consistency in colors and themes used by different visualizations, such as in a dashboard. The scope of the service is limited, as is the current palette that we have, but its existence gives us a place to focus on. The way the service works is really simple: all unique labels get a unique color, with colors derived from a "seed" palette using hue shifting.

The existing service is very limited in that it does not allow users to specify the color for categories. And the service does not grab the top terms. It treats each new category as a new color. This can generated too many categories when there is high carnality. Its better if the server grabs the top X categories and places the remaining in an Other category. For example, the maps app grabs the top 9 terms for a field. Anything outside of this is given a single color so the number of categories is controlled.

wylieconlon commented 4 years ago

@nreese exactly, I think you're identifying that we need to change the strategy for assigning colors. It is definitely a problem that we are using too many colors, but I don't think this means that we should not use the color service. We should improve it.

monfera commented 4 years ago

elastic-charts links: the color/spectrum ticket by @cchaos that predates this issue has a super useful mock gif that shows different color assignment strategies, and Caroline's issue is also linked by Handling color by @markov00, also predating Beyond palettes

monfera commented 3 years ago

Adding a couple of more references, focusing on color:

Danielle Szafir: Modeling Color Difference for Visualization Design

An Engineering Model for Color Difference as a Function of Size

Measuring the Separability of Shape, Size, and Color in Scatterplots

Mapping Color to Meaning in Colormap Data Visualizations

Selecting Semantically-Resonant Colors for Data Visualization (UW Interactive Data Lab)

monfera commented 2 years ago

New article by @emeeks: Data visualization has a taxonomy problem.

Charts are a bad unit of measurement of data visualization. If we think of charts as species of data visualization, we use the wrong metaphor. A stacked bar chart isn’t a species that evolved from a bar chart. Instead, a stacked bar chart is a mix of numerical and hierarchical visualization of the components. Let’s change the metaphor from species to food dishes. No one thinks a carrot cake evolved from a chocolate cake, they are simply two different offerings in the category of desserts.

I mostly agree, though think that often, chart types can be seen as relating to, or culturally, implementationally or projection-wise deriving from one another, often through multiple, alternative paths, and often through other "nodes" that can in and of themselves thought of as legit chart types. For example, stacked bar charts, esp. the relative / % based variety, also share strong bonds with partition charts such as mosaic plots, treemaps, pie charts

ryankeairns commented 2 years ago

cc:/ @gvnmagni you might want to Subscribe to this long running thread :)

gvnmagni commented 2 years ago

This is very interesting! In addition to the product improvements that we can make taking all of this into considerations, what I can see here is a great starting point for proper guidelines. There is a ton of material here that could be useful for our designers and for our users in order to understand what to do when dealing with charts, dashboards and so on.

Thank you @ryankeairns for pointing this to me!

monfera commented 2 years ago

This is a convincing paper that argues against the tyranny of going with the "most accurately readable" projections Why Shouldn’t All Charts Be Scatter Plots? Beyond Precision-Driven Visualizations

It partially absolves pie charts, elevates heatmaps and possibly other spatial-like projections that prioritize overall pattern at the expense of quantitative readability or comparability of individual numbers. A slight concern is the low count of examples (only two visual examples are in the paper). I also feel that the paper partially attacks a strawman: readability (eg. of position vs. color/intensity) in other literature is often described in isolation (and this paper's abstract quotes "comparing individual values"); on a real dashboard/report/small multiples etc. there are a lot of confounding factors, such as how we perceive things when there's a lot of them. I don't think anyone thinks that the positions/points (top left) are generally superior to intensities/heatmap (bottom right):

And I think very few people ever took literally, or as a singular, absolute goal/constraint, the data-to-ink ratio. It was due reaction in an age when garish charts, vacant embellishments, 3D bar charts etc. swept over the industry.

Also, quantitative readability may be a design goal, or, depending on circumstances, even one of the most important goals (though "most important" would degenerate the visualization into a table); in practice, as with all design tasks, there's a nonlinear blend of hard to quantify goals, and innumerable constraints apply too.

A real visualization use case may involve a heatmap, showing patterns well, but with poorer individual quantitative comparability, accompanied by tooltip, subordinate charts, drilldown or navigation to other charts for more quantitative analysis—if needed.

The paper is also valuable as a well organized set of references surrounding the topic of readability, data to ink ratio, chartjunk, memorability etc.

elastic / kibana

Beyond palettes: shared visual attributes #61977

Shared mapping to visual attributes

Visual attributes and pattern matching: the core of data visualization

Role

Unit of data visualization

Style guide theme vs. attribute mapping

Legends

Tooltips

Annotations

Accessibility

Inherent or acquired meaning

Social context

Saliency

Configuration

Issue and paper links

Takeaways