d3 / d3-shape

Graphical primitives for visualization, such as lines and areas.
https://d3js.org/d3-shape
ISC License
2.47k stars 305 forks source link

fix(insideOut.js): fix the bug that stack orders do not show by onset… #126

Closed ZijingPeng closed 5 years ago

ZijingPeng commented 5 years ago

… time when it is a ThemeRiver

By reading the paper you mentioned in the d3-shape API, I think there might be something wrong in d3-shape/src/order/insideOut.js. I think what the writer of the paper means is that “inside-out” is an ordering that sorts the layers by onset time and add layers alternately to the beginning and end of a layer list, whereas this method has a drawback, that is the when simply alternately adding the layer, it may lead to some asymmetric pattern—the top of the streamgraph can be much larger than the bottom of it. In this circumstance, we can use the sum of each layer as the weight to judge whether to put the layer on the top or the bottom. I notice that your code only use the sum of the layer to sort without using the onset time to sort the series first, which I think is very significant and shouldn’t be left out. And I just modified your source code and add sorting by onset time codes.

Following is the related part of the paper Stacked Graphs – Geometry & Aesthetics by Lee Byron & Martin Wattenberg.

One might consider sorting the data set by “onset time”. If the “new” layers are always added along the top, the graph takes on a distracting downward diagonal stripe pattern in addition to an upward angle to the overall silhouette due to the layout algorithm’s effort to keep the sum of slopes low (fig 13).

To prevent this, layers are given a “inside-out” ordering, in which early-onset time series are placed at the middle, with later-onset series at the top and bottom. This has three benefits in addition to avoiding the diagonal-stripe effect. First, it places the biggest bursts in the layers—the first non-zero value—at the outside the graph, where they will disrupt the layout of other layers the least, drastically improving legibility, design issues (A-C). Second, we speculate that the top and bottom regions of the graph tend to be most prominent areas, since they occur near the high-contrast silhouette. The central “core” of the graph, the middle, may be read secondarily. Since the bursts are the most “interesting” part of the data in many cases, the inside-out layout places them in the potentially prominent position (fig 14). Third, it prevents a drift of the layout away from the x-axis, an artifact that can be seen dramatically in fig 13.

The particular inside-out ordering is defined as follows. Note that one easy method would be simply to sort the layers by onset time, and then add layers alternately to the beginning and end of a layer list.

Unfortunately, this simple method could potentially lead to a highly asymmetric graph if the layers that end up at the beginning of the list represent much larger values than the ones at the end. To prevent this asymmetry, we use the following algorithm in ordering the layers. First, we define the “weight” of a time series as the sum of all its values. Then after sorting by onset time, we add time series to the list one by one, attempting to “even out” the weight between the top and bottom half: more precisely, if the sum of the weight of the first half of the current list is greater than half the total weight, we add the series the end; otherwise, we add to the beginning.

You can also read the paper in the website Stacked Graphs – Geometry & Aesthetics

mbostock commented 5 years ago

Related #106.