d3.stack for tidy data?

mbostock commented 4 years ago

d3.stack is designed to work with non-tidy data where each row corresponds to a “group” (the set of observations for all layers, e.g., year) with properties for each “layer” a.k.a. series (e.g., format) recording the observed value (e.g., revenue).

Year	8 - Track	Cassette	Cassette Single
1973	2699600000	419600000	0
1974	2730600000	433600000	0

In the tidy format, in contrast, rows correspond to observations and columns correspond to variables. (This is less efficient as the layer names are repeated, but oh well.)

Year	Format	Revenue
1973	8 - Track	2699600000
1973	Cassette	419600000
1973	Cassette Single	0
1974	8 - Track	2730600000
1974	Cassette	433600000
1974	Cassette Single	0

It’s possible to use tidy data with d3.stack, but it’s a little convoluted.

series = d3.stack()
    .keys(d3.group(data, d => d.name).keys())
    .value((group, key) => group.get(key).value)
    .order(d3.stackOrderReverse)
  (d3.rollup(data, ([d]) => d, d => d.year, d => d.name).values())
    .map(s => (s.forEach(d => d.data = d.data.get(s.key)), s))

It’d be nice if were more convenient to give d3.stack tidy data, say like so:

series = d3.stack()
    .key(d => [d.name, d.year])
    .value(d => d.value)
    .order(d3.stackOrderReverse)
  (data)

Here the key accessor would return a two-part key: the layer key and the group key. And the value accessor wouldn’t need to know the current keys. (Because the data is tidy, the value accessor is the same for all observations.)

An implication of the proposed design is that the data can be sparse: some layers may be missing observations for some groups (and equivalently vice versa). That’s not possible with the current design because the layer keys (stack.keys) and group keys (data) are specified as separate arrays, but it should be easy enough for d3.stack to compute the union of layer keys and the union of group keys to fill in the missing data. d3.stack probably will also need some facility for ordering the group keys, as the order may not be consistent across layers.

I imagine it’ll be difficult to make this backwards-compatible, but maybe it’s possible, or maybe it could be under a new name such as d3.stackTidy.

Fil commented 4 years ago

Absolutely! We can use https://observablehq.com/@fil/ncov2019-data#databyday for a current example :-/ (I don't think my data wrangling in that notebook is the most straightforward.)

mbostock commented 4 years ago

Here’s an earlier example that breaks the data transformation into separate cells:

https://observablehq.com/@d3/stacked-area-chart-via-d3-group

Fil commented 2 years ago

I feel that Plot's stack transform is the correct answer now, but we need to design the API if we want to port it back to D3.

d3 / d3-shape

d3.stack for tidy data? #158