JuliaMath / Interpolations.jl

Fast, continuous interpolation of discrete datasets in Julia
http://juliamath.github.io/Interpolations.jl/
Other
523 stars 111 forks source link

Enhancement proposal: restructure package around knot-vectors and their indices #227

Open timholy opened 6 years ago

timholy commented 6 years ago

Plugging away at #226 has been the occasion for reconsidering the overall structure of the package. I've been slowly evolving towards one significant internal API change (introducing WeightedIndex) which hopefully you'll see soon as part of #226, so I won't bother describing it here. Here, I want to focus on a much more visible form of restructuring that may not be part of #226 but which I think is worth considering. This proposal stems from our accumulated experience thinking about indexing, arrays, and other container types elsewhere in the Julia ecosystem (CCing @mbauman and @andyferris, who have helped shape these views).

We currently have 3 interpolation types: BSpline, Scaled, and Gridded. (We also have extrapolation types, but that's orthogonal to the content of this proposal.) The core point of this proposal is that I believe we can:

by centralizing everything around the idea of knot vectors.

As you all know, the knots are the places where we shift the parametrization of the underlying polynomial: when you cross a knot you end up accessing different elements of the underlying coefficient array. For an array axis spanning 1:n, currently we have the following:

Another important component is padding, which is currently expressed through a type-parameter of the appropriate AbstractInterpolation type. Now that we have OffsetArrays, a much cleaner approach (which you'll see in #226) is to use the coefficient array's axes to encode the padding: for example, quadratic interpolation requires an extra padding element on either end, and so the natural approach would be to index this axis of the coefficient array with integers that range from 0 to n+1, where the 0th and n+1st elements are the padding. By using an OffsetArray it's easy to keep everything in alignment.

Let's examine some implicit elements of the scheme above a bit more deeply. Note that the knot-vector actually has two properties: each element has a position, but we also address it via an index. That is to say, the position of the kth knot is knotvec[k] where k is the index. A helpful analogy is to a Dict, which has key/value pairs; here, the key is the index and the value is the position. If you think about it a bit, the important thing is that it's the indices of the knot vector that specify the corresponding indices of the coefficient array: if the position x at which you're trying to evaluate the interpolation lies between knotvec[k] and knotvec[k+1], then for linear interpolation I'm going to be accessing the coefficient array at indices k and k+1. (x and the values of knotvec affect the weights in a continuous fashion, but the indices are determined discontinuously by the k satisfying knotvec[k] <= x <= knotvec[k+1].)

Realizing that the knot vector's indices, rather than values, encode the alignment with the coefficient array, we come to notice that we're not really making good use of the knot-vector's values, at least for the core BSpline type. We do, however, formalize these more for the Scaled and Gridded types. Pursuing this just a bit, we come up with the following scheme:

Consequently, all three "top level types" become variants of one overall type with different knot vectors. Unification has some significant advantages, among them being that it lets me mix-and-match: I could have an interpolation object which has p axes with AbstractRange-type knot vectors and q axes with AbstractVector-type knot vectors. Currently, if you have just one AbstractVector knot-vector, you have to use GriddedInterpolation and use them for all of your axes. Looking up the index in a sorted AbstractVector is an O(logn) process, but it's O(1) for AbstractRange, so being forced to use slow lookup for all of them is a bummer.

The full (aka, most general offering the greatest degree of control) user-level API might look something like this:

itp = interpolate(A, BSpline(degree1, knotvec1), BSpline(degree2, knotvec2), ...)

As a concrete example, for a 5x4 array A, our current

itp =  interpolate(A, (BSpline(Linear()), BSpline(Quadratic(Flat())), (OnGrid(), OnCell()))

would be replaced by

itp = interpolate(A, BSpline(Linear(), 1:5), BSpline(Quadratic(Flat()), 0.5:4.5))

But we could also do fun things like

itp = interpolate(A, BSpline(Linear(), 0mm:0.4mm:1.6mm), BSpline(Quadratic(Flat()), [2s, 3.8s, 4.3s, 8.6s, 9.2s])

and access values as itp(0.83mm, 5.2s). (This particular example represents a series of measurements taken at regular spatial positions but sampled non-uniformly in time; if the 2nd "time axis" had been an AbstractRange it would have represented regular temporal sampling.) This makes more sense than ever, now that we're moving towards using () rather than [] for indexing at positions. We can still use [] for looking up Integer locations (and my upcoming WeightedIndex type) that are divorced from the positional meaning of the knot vectors.

That's pretty much it. Thoughts?

timholy commented 6 years ago

Realized that @jlperla and @chiyahn are not watching this package, but their advocacy for the function-call syntax is relevant for the discussion here too.

tomasaschan commented 6 years ago

I like it.

I'm not sure that the ScaledInterpolation was originally intended to fit as well in with the others as you express it here, though. My intention with scaling (and extrapolation) as separate concepts from B-spline (and other) implementation schemes, was to make sure that as much as possible of that infrastructure can be used if/when e.g. Hermite splines (#6) are added. However, as you say it seems to fit very well in with the framework.

I've also been mulling secretly on a generic "tensorization scheme", which would allow for (admittedly probably much slower, but still) N-dimentional interpolations with any scheme for which there is a 1-dimensional implementation, where you basically create the Nth dimension of length n by supplying n N-1-dimensional interpolation objects and a knot vector, also of length n, specifying where along the Nth axis the inner interpolation objects belong. Interpolation in this scheme would work much the same as our current B-splines do, but with less performance magic: figure out one dimension at a time, gradually building up the point in N-dimensional space we're looking for. This is a big undertaking, though, so it's remained an abstract idea in my head for a long time...

However, I think it would be nice if we could build the APIs around such functionality (scaling, extrapolation, perhaps one day N-dimensionality) in a way that you can just build a dumb 1D interpolation scheme over a unit range, and get all the rest "for free", and if your interpolation scheme makes it easy, or someone wants it badly enough for performance reasons, you can override select parts of the functionality with something more optimized. Multiple dispatch as a paradigm should excel at this.

timholy commented 6 years ago

Actually, I think expand in #226 basically does that.

tomasaschan commented 6 years ago

https://github.com/JuliaMath/Interpolations.jl/pull/226#discussion_r209846461 seems to apply in more than one place, then.

Evizero commented 6 years ago

itp = interpolate(A, BSpline(Linear(), 1:5), BSpline(Quadratic(Flat()), 0.5:4.5))

Would that imply that in order to create an interpolation one would have to explicitly specify the range (to know the knot vectors)?

timholy commented 6 years ago

Good question. I personally would favor some defaults, e.g., the equivalent of OnGrid for odd orders (Linear and Cubic) and OnCell for even orders (Constant and Quadratic). We could implement BSpline as

struct BSpline{Order,V<:Union{AbstractVector,Nothing}}
    order::Order
    axis::V
end

and then replace a nothing axis field with the default by dispatching on it when running interpolate. (You need to have the array object available to construct a range, so you know what the bounds are. So BSpline can't provide the default on its own.)

andyferris commented 6 years ago

This makes perfect sense to me, Tim :)

floswald commented 6 years ago

our current GriddedInterpolation is effectively a BSplineInterpolation{K} where K<:AbstractVector (the knots must be sorted)

One important distinction between the Gridded and other interpolation types was up until now that

  1. K must be sorted in a Gridded interpolation, but they may represent an irregular grid, i.e. grid = [1;2;5.1;5.4;6] is valid.
  2. The grids for the other types were required to be uniformly spaced.

We talked several times about the difficulty of extending irregular grids to non-Gridded types. Would that change in any way with your new proposal? I guess if you start focusing on indices, the importance of the value behind the index vanishes? Wild guess, really.