Multi-layer sunburst doesn't allow inner layers to have their own sizes

wylieconlon commented 4 years ago

We don't always want or need the partition charts to calculate rolled-up sums when showing pie charts, because we can request hierarchical information from Elasticsearch. Here are two examples of data that Elasticsearch can return for us:

Level 1	Level 1 size	Level 2	Level 2 size
A	10	Yes	4
A	10	No	2
B	50	Yes	40
B	50	No	5

Level 1	Level 1 average	Level 2	Level 2 average
A	3	Yes	4
A	3	No	2
B	15	Yes	20
B	15	No	10

The first table example renders what I would consider to be incorrect data using the pie chart defaults. Notice that the value at level 1 is calculated as "42" instead of "50":

Instead, I want to be able to control the sizing so that we see gaps in the chart for the missing values:

monfera commented 4 years ago

An API element, showAccessor, in in place to handle gaps, see below.

A quick, incomplete list of (surmountable) challenges with the proposed direct specification of non-leaf values means that such separation would need planning and implementation:

It would allow a broader class of inconsistent input, eg. the sum of leaf nodes can be higher than the root value
We'd still need to also provide the option for internal aggregation if other use(r)s need it. Example: a user should be able to specify a simple pie chart from the values [7, 5, 4, 2, 1] without also specifying that the pie total is indeed 19; easy to sum for a single-layer pie but would need hierarchical aggregation for sunbursts of arbitrary layer count. The issue description also says "We don't always want or need..."
The non-aggregating version of the API would need to admit group nodes directly, eg. as separate rows. Having both options means that we'd need to keep the aggregation while providing a bypass/override. So it might lead to a messy API and implementation, or at a minimum, some planning would be needed needed for this

If there are reasons to move toward this direction, let's discuss. We're working toward API layering in general.

The API element, showAccessor, enables this today:

The missing value (shortfall) is made explicit via an auxiliary row whose purpose is to force totals of the parent: subtract leaf values from the stated group total value
The showAccessor method returns false for these entries

A storybook example is here:

It's always enough to just look at the current node, and its stated leafs together to compute the shortfall, ie. there's no need for the user library to hierarchically crawl the response set (a nice property).

To generate the bottom image in the issue description: Level 1	Level 2	size	show
A	Yes	4	true
A	No	2	true
A	null	4, from 10 - 4 - 2	false
B	Yes	40	true
B	null	10, from 50 - 40	false

Performance doesn't seem to be a concern as sunbursts aren't super scalable viz for perceptual reasons, so usually there's no more than a dozen or two, or in some extreme case, OOM 100 ... OOM 1000 sectors.

If there's a convenience concern with it, let's discuss it together with @markov00, also in light of when it'd be needed by.

wylieconlon commented 3 years ago

@monfera We have discussed this more and I have some better examples. It's no longer a requirement for Lens, but I think it is a requirement to match the existing behavior of Visualize.

The non-aggregating version of the API would need to admit group nodes directly

This direct control of nodes is a missing feature of the current API, and it's also the existing behavior of Visualize so I expect you'll need to add it soon. One of the reasons it's set up this way in Visualize is that it lets us maintain consistent numbers in the transition from single-level to multi-level charts. To highlight that point more: Visualize pie charts always display the inner layers at the same size, even as we add layers. Each layer is relative to its parent, not to its children.

From an API perspective I agree that we would need to rethink the existing APIs to support this. You suggested adding extra rows, and another option is to use a tree structure. Vega has a similar concept of converting rows into a tree structure.

cc @nickofthyme

monfera commented 3 years ago

@wylieconlon thanks for the scope update.

I'm not speaking for Visualize or its usage now, just for the general way part to whole charts work, and how pie/sunburst users usually expect it. The math and the user-perceived angular proportions need to work out within a layer and across layers, whichever style the API uses. If the current API can't handle it, then maybe it's not really a pie/sunburst chart, yielding incorrect, easy to misread visualizations. Not yet clear is why the size of a node should change as one shows its children (visualizing an additional layer) unless eg. we fetch more data and data is shown more accurately or more completely (ie. if data is uncertain, which would need some approach eg. a fuzzy, blurred grey slice...).

Each layer is relative to its parent, not to its children.

looks like a step away from the constraint that each parent is a sum of its children (plus maybe of one or more hidden "gap" slices).

Since you say you have better examples, it would be useful to copy one as in the original description.

It's a good time to get to the bottom of the practice and try to resolve, otherwise momentum may lead us to reproduce a behavior that should perhaps be fixed, or solved differently. Partitioning charts are not slices/sectors with arbitrary numbers in them, ie. they're not aesthetic number decorators, I hope. cc @markov00 @nickofthyme

markov00 commented 3 years ago

This direct control of nodes is a missing feature of the current API, and it's also the existing behavior of Visualize so I expect you'll need to add it soon. One of the reasons it's set up this way in Visualize is that it lets us maintain consistent numbers in the transition from single-level to multi-level charts. To highlight that point more: Visualize pie charts always display the inner layers at the same size, even as we add layers. Each layer is relative to its parent, not to its children.

@wylieconlon I don't fully understand what is the problem here: Visualize seems to display the data in the exact way the current API handles it. If you specify Show missing values, the data returned by ES reflect what is the request in this ticket and what Robert describes: it adds one row for each missing value, having the exact table input as described here: https://github.com/elastic/elastic-charts/issues/876#issuecomment-713375158

Without that show missing values option, the behavior is the one described initially, where the value of the root level C is hidden

wylieconlon commented 3 years ago

@monfera @markov00 I am trying to show you that the existing defaults in Visualize are not what you think they are. Visualize does not care if the outer rings add up to the same size as the inner ring- the outer rings do not need to sum up to the size of the inner ring.

Here is an example I created in two steps. I'm using the Terms aggregation- this aggregation return incomplete data because it only includes the top results, not all results.

Top 5 countries (showing only one for simplicity)

geo.src: Descending	Count	Percent
CN	28	33.73%

Top 5 countries (showing only one for simplicity), then top top 3 operating systems (there are more in the data)

geo.src: Descending	Count	Percent	machine.os.keyword: Descending	Count	Percent
CN	28	33.73%	ios	7	35%
CN	28	33.73%	osx	7	35%
CN	28	33.73%	win 8	6	30%

Notice that the outer ring is smaller than the inner ring: the numbers don't add up, but the slices are displayed as if they are representing 100% of the data.

monfera commented 3 years ago

@wylieconlon thanks for the example; as I mentioned, I made no claims or assumptions about vislib/Visualize, merely described what part-to-whole charts (as a dataviz idiom, eg. pie, sunburst, treemap) are. The above examples show something different. Many folks more informed than us wrote on this. eg. Robert Kosara:

If the parts do not sum up to a meaningful whole, they cannot be represented in a pie chart, period. It makes no sense to show five different occupations in a pie chart, because there are obviously many missing. The total of such a subsample is not meaningful, and neither is the comparison of each individual value to the artificial whole.

Partitioning charts are called for when values add up to a whole.

monfera commented 3 years ago

Subsequent to this discussion, the standard interpretation of pie/sunburst chart that elastic-charts had also been using (described above) was adopted for Lens, and recently, Visualize too, so this ticket is closed

elastic / elastic-charts

Multi-layer sunburst doesn't allow inner layers to have their own sizes #876