d3 / d3-shape

Graphical primitives for visualization, such as lines and areas.
https://d3js.org/d3-shape
ISC License
2.48k stars 308 forks source link

Moving averages, line smoothing, etc. #43

Closed mbostock closed 4 years ago

mbostock commented 8 years ago

I’m not 100% sure this belongs as a curve (or as a data transformation), but it seems like a reasonable place to put it. In theory if you used a smooth curve, the input data array to the line or area generator wouldn’t even need to be sorted by x-value: the smoother could buffer all the points and then compute the smooth curve on lineEnd.

mbostock commented 8 years ago

A kernel regression is another possibility, as in this kernel density estimation example. That implementation is O(n^2), but I suspect you could make it O(n lg n), or even O(n) if you assume that data is already sorted on x, and then implement a single-pass, fixed-window incremental algorithm.

mbostock commented 8 years ago

LOESS is another option.

mbostock commented 8 years ago

And here’s a simple test case:

screen shot 2015-12-01 at 1 17 59 pm

With a smooth curve in R:

cvgpl7jueaamixy

curran commented 8 years ago

Yes! When I first saw this visualization I thought of trying to implement a gaussian KDE for use with D3.

mbostock commented 8 years ago

I couldn’t resist taking a crack at it. Here’s the slow O(n^2) implementation of kernel smoothing:

screen shot 2015-12-01 at 3 34 12 pm

Things that would be nice:

mbostock commented 8 years ago

I’m going back to working on API documentation if anyone else wants to make a better implementation…

curran commented 8 years ago

Doing a little research, here's a list of common kernel functions

Also, here's a formula for bandwidth estimation using "Silverman's rule of thumb".

mbostock commented 8 years ago

See also science.js, which implements bandwidth estimators and various kernels.

e-n-f commented 8 years ago

My fit above, by the way, is the sum of two gaussian PDFs, plus the sum of the same gaussians for x-24 and x+24, all raised to a power, for no good reason other than that it looked about right. I'd love to know a better way than eyeballing it and then applying general curve fitting to guess the components of things that look like the sum of gaussians.

mbostock commented 8 years ago

See also ggplot2’s stat_smooth, which uses LOESS and GAM (generalized additive models). And ggplot2 has a geom_smooth which can smooth using a ribbon, showing an upper and lower bound on the smoothed model. That might be a nice curve type for an area shape.

curran commented 8 years ago

Here's the Loess + D3 example from science.js by @jasondavies . I made a bl.ock to study it:

loess

It looks like the API expects loess(xValuesArray, yValuesArray) and returns an array of smoothed Y values.

mbostock commented 8 years ago

Since the kernel operates in pixel space, I bet it would be reasonable to just have stupid defaults for the bandwidth (±20px) and precision (control points every 10px). Though certainly it would be nice if Silverman’s rule of thumb were easy to implement if you wanted a better default value, albeit at the cost of a global computation.

curran commented 8 years ago

Here's an example that adds a LOESS curve to your tweet example using science.js: loess

I made this to try to understand the science.js LOESS implementation. Here are a few things I learned:

mbostock commented 8 years ago

I think it might be a good idea to do smoothing as a pre-processing step in data space rather than doing it internally as a curve:

I’m imagining something that has a x and y accessors and other configuration (e.g., kernel and bandwidth) and then computes the smooth curve, generating an array of [[x1, y1], [x2, y2], …] points in data space. Those can then be passed to a line generator, likely with an x and y accessor that apply scales to transform to screen space.

(I’m slightly tempted to have it return [[x1, x2, x3, …], [y1, y2, y3, …]] because that’s a more more efficient representation, but it would make implementing the accessors a little bit more awkward, so it feels like premature optimization.)

tomgp commented 8 years ago

+1 on the idea that this should be a data space operation.

mbostock commented 8 years ago

Probably should create a d3-smooth module for smoothing functions.

muonsw commented 8 years ago

Hope you don't mind me commenting but I found my way to this after looking at the issues on science.js.

I'm sure things have moved on since December, but I implemented a kernel smoothing function for use with D3 a while back. It cuts off the calculation for each point once the calculated weight goes below a defined threshold (with relation to the weight at the point that is being smoothed).

I'm fairly new to JavaScript so I'm sure that it’s not the best implementation. Anyway, I've created a gist if it’s of any interest (or use).

gist

nealstewart commented 7 years ago

@mbostock Any more recent thoughts in this area? I need some line smoothing for a time series of continuous data in my application, and would love to contribute something.

Fil commented 5 years ago

d3-regression implements loess https://github.com/HarryStevens/d3-regression

mbostock commented 5 years ago

Here’s a simple moving average implementation:

https://observablehq.com/@d3/moving-average

mbostock commented 4 years ago

This feels more appropriate to do in data space rather than geometrically, so I don’t think d3-shape is the right place for this. Still interested in seeing progress here, though.