Closed mbostock closed 5 years ago
A kernel regression is another possibility, as in this kernel density estimation example. That implementation is O(n^2), but I suspect you could make it O(n lg n), or even O(n) if you assume that data is already sorted on x, and then implement a single-pass, fixed-window incremental algorithm.
Yes! When I first saw this visualization I thought of trying to implement a gaussian KDE for use with D3.
I couldn’t resist taking a crack at it. Here’s the slow O(n^2) implementation of kernel smoothing:
Things that would be nice:
I’m going back to working on API documentation if anyone else wants to make a better implementation…
Doing a little research, here's a list of common kernel functions
Also, here's a formula for bandwidth estimation using "Silverman's rule of thumb".
See also science.js, which implements bandwidth estimators and various kernels.
My fit above, by the way, is the sum of two gaussian PDFs, plus the sum of the same gaussians for x-24 and x+24, all raised to a power, for no good reason other than that it looked about right. I'd love to know a better way than eyeballing it and then applying general curve fitting to guess the components of things that look like the sum of gaussians.
See also ggplot2’s stat_smooth, which uses LOESS and GAM (generalized additive models). And ggplot2 has a geom_smooth which can smooth using a ribbon, showing an upper and lower bound on the smoothed model. That might be a nice curve type for an area shape.
Here's the Loess + D3 example from science.js by @jasondavies . I made a bl.ock to study it:
It looks like the API expects loess(xValuesArray, yValuesArray)
and returns an array of smoothed Y values.
Since the kernel operates in pixel space, I bet it would be reasonable to just have stupid defaults for the bandwidth (±20px) and precision (control points every 10px). Though certainly it would be nice if Silverman’s rule of thumb were easy to implement if you wanted a better default value, albeit at the cost of a global computation.
Here's an example that adds a LOESS curve to your tweet example using science.js:
I made this to try to understand the science.js LOESS implementation. Here are a few things I learned:
loess
function accepts arrays of values in data space, and surprisingly it works with dates.loess
function returns an arrays of smoothed Y values that is the same length as the original arrays. I would have hoped that the length of the returned array is configurable, but it looks like in this implementation it is not.I think it might be a good idea to do smoothing as a pre-processing step in data space rather than doing it internally as a curve:
I’m imagining something that has a x and y accessors and other configuration (e.g., kernel and bandwidth) and then computes the smooth curve, generating an array of [[x1, y1], [x2, y2], …] points in data space. Those can then be passed to a line generator, likely with an x and y accessor that apply scales to transform to screen space.
(I’m slightly tempted to have it return [[x1, x2, x3, …], [y1, y2, y3, …]] because that’s a more more efficient representation, but it would make implementing the accessors a little bit more awkward, so it feels like premature optimization.)
+1 on the idea that this should be a data space operation.
Probably should create a d3-smooth module for smoothing functions.
Hope you don't mind me commenting but I found my way to this after looking at the issues on science.js.
I'm sure things have moved on since December, but I implemented a kernel smoothing function for use with D3 a while back. It cuts off the calculation for each point once the calculated weight goes below a defined threshold (with relation to the weight at the point that is being smoothed).
I'm fairly new to JavaScript so I'm sure that it’s not the best implementation. Anyway, I've created a gist if it’s of any interest (or use).
@mbostock Any more recent thoughts in this area? I need some line smoothing for a time series of continuous data in my application, and would love to contribute something.
d3-regression implements loess https://github.com/HarryStevens/d3-regression
Here’s a simple moving average implementation:
This feels more appropriate to do in data space rather than geometrically, so I don’t think d3-shape is the right place for this. Still interested in seeing progress here, though.
I’m not 100% sure this belongs as a curve (or as a data transformation), but it seems like a reasonable place to put it. In theory if you used a smooth curve, the input data array to the line or area generator wouldn’t even need to be sorted by x-value: the smoother could buffer all the points and then compute the smooth curve on lineEnd.