d3 / d3-shape

Graphical primitives for visualization, such as lines and areas.
https://d3js.org/d3-shape
ISC License
2.47k stars 305 forks source link

d3.stack for tidy data? #158

Open mbostock opened 4 years ago

mbostock commented 4 years ago

d3.stack is designed to work with non-tidy data where each row corresponds to a “group” (the set of observations for all layers, e.g., year) with properties for each “layer” a.k.a. series (e.g., format) recording the observed value (e.g., revenue).

Year 8 - Track Cassette Cassette Single
1973 2699600000 419600000 0
1974 2730600000 433600000 0

In the tidy format, in contrast, rows correspond to observations and columns correspond to variables. (This is less efficient as the layer names are repeated, but oh well.)

Year Format Revenue
1973 8 - Track 2699600000
1973 Cassette 419600000
1973 Cassette Single 0
1974 8 - Track 2730600000
1974 Cassette 433600000
1974 Cassette Single 0

It’s possible to use tidy data with d3.stack, but it’s a little convoluted.

series = d3.stack()
    .keys(d3.group(data, d => d.name).keys())
    .value((group, key) => group.get(key).value)
    .order(d3.stackOrderReverse)
  (d3.rollup(data, ([d]) => d, d => d.year, d => d.name).values())
    .map(s => (s.forEach(d => d.data = d.data.get(s.key)), s))

It’d be nice if were more convenient to give d3.stack tidy data, say like so:

series = d3.stack()
    .key(d => [d.name, d.year])
    .value(d => d.value)
    .order(d3.stackOrderReverse)
  (data)

Here the key accessor would return a two-part key: the layer key and the group key. And the value accessor wouldn’t need to know the current keys. (Because the data is tidy, the value accessor is the same for all observations.)

An implication of the proposed design is that the data can be sparse: some layers may be missing observations for some groups (and equivalently vice versa). That’s not possible with the current design because the layer keys (stack.keys) and group keys (data) are specified as separate arrays, but it should be easy enough for d3.stack to compute the union of layer keys and the union of group keys to fill in the missing data. d3.stack probably will also need some facility for ordering the group keys, as the order may not be consistent across layers.

I imagine it’ll be difficult to make this backwards-compatible, but maybe it’s possible, or maybe it could be under a new name such as d3.stackTidy.

Fil commented 4 years ago

Absolutely! We can use https://observablehq.com/@fil/ncov2019-data#databyday for a current example :-/ (I don't think my data wrangling in that notebook is the most straightforward.)

mbostock commented 4 years ago

Here’s an earlier example that breaks the data transformation into separate cells:

https://observablehq.com/@d3/stacked-area-chart-via-d3-group

Fil commented 2 years ago

I feel that Plot's stack transform is the correct answer now, but we need to design the API if we want to port it back to D3.