Design of package - Githubissues

dlfivefifty commented 6 years ago

Isn’t a FunctionSpace (almost by definition) just a Space with evaluation defined (that respects the linearity of the space)?

It almost feels like Fun is misnomer, as the space could be something other than a FunctionSpace. Perhaps its more general to do the following:

# In FunctionSpaces.jl
struct Vec{S<:Space, T, VT}
   space::S
   coefficients::VT
end

# In ApproxFun.jl
const Fun{S, T, VT} = Vec{S<:FunctionSpace, T, VT}

MikaelSlevinsky commented 6 years ago

I don't see what's so "quasi" about the bra-kets. Aren't they just infinite matrices with uncountably infinite index sets? Somehow, the entire wikipedia article on matrices (in particular, sub-subsection on Infinite Matrices) manages to get by without the need for a "quasi" term. https://en.wikipedia.org/wiki/Matrix_(mathematics)#Abstract_algebraic_aspects_and_generalizations

I still think proper function space types may help bring clarity beyond just examining tail growth/decay. It would be a nice way to dispatch on norms and inner products.

Also, as far as I can recall, there have been only two major changes to the type Fun since 2014. The first was uniting IFun and FFun by introducing the space field (misnomer) to describe sets. The second has been freeing up the coefficients from ::Vector{T}. There have been other minor changes along the way, e.g. ordering of template parameters.

To me, aliasing a Fun into a Julia linear algebra concept is perfectly fine, but may have unintended consequences (that, tautologically, I don't have a concrete example for). Except that one thing comes to mind: Julia's linear algebra appears to have evolved much faster than the Fun type in ApproxFun. How would this change affect the long-term stability of the code?

dlfivefifty commented 6 years ago

@jagot and I have discussed the design and I just pushed the first version. It has a sketch for Legendre and for B-splines and seems to be working well.

It's not going to be about functions or spaces so we can think of a new name. QuasiMatrices.jl is too vague, though maybe it's fine. BasisFunctions.jl emphasises functions too much. GeneralizedVectors.jl?

marcusdavidwebb commented 6 years ago

How about ContinuumArrays.jl? A continuum array is an array where one or more of the dimensions is indexed by a continuum. It's not an existing term. This includes quasimatrices and cmatrices described in http://rspa.royalsocietypublishing.org/content/471/2173/20140585.

dlfivefifty commented 6 years ago

ContinuumArrays.jl would be very different to GeneralizedVectors.jl (which is the design we started in the code). ContinuumArrays.jl is a generalisation of arrays while GeneralizedVectors.jl is something more familiar to quantum physicists and FEM.

That said, I think ContinuumArrays.jl is more "Julian", more constrained in scope, likely easier to design, and probably better idea for now. Unfortunately, AbstractArray expects integer indexing, so we cannot piggyback on Base (i.e., Diagonal, SubArray) like in InfiniteArrays.jl, but I think this won't be too much of an issue.

Here's a quick mockup:

## Option 3 ContinuumArrays.jl

using LinearAlgebra, IntervalSets, FillArrays, LazyArrays
import Base: size, getindex, show, +, *, -, convert, copyto!, length, axes, parent, eltype
import LinearAlgebra: adjoint, SymTridiagonal
import InfiniteArrays: Infinity
import LazyArrays: MemoryLayout

abstract type ContinuumArray{T, N} end
const ContinuumVector{T} = ContinuumArray{T, 1}
const ContinuumMatrix{T} = ContinuumArray{T, 2}

eltype(::Type{<:ContinuumArray{T}}) where T = T
eltype(::ContinuumArray{T}) where T = T

struct ℵ₀ <: Number end
_length(::AbstractInterval) = ℵ₀
_length(d) = length(d)

size(A::ContinuumArray) = _length.(axes(A))
axes(A::ContinuumArray, j::Int) = axes(A)[j]
size(A::ContinuumArray, j::Int) = size(A)[j]

struct ContinuumLayout <: MemoryLayout end
MemoryLayout(::ContinuumArray) = ContinuumLayout()

getindex(B::ContinuumMatrix, K::AbstractVector, j::Real) =
    [B[k, j] for k in K]

getindex(B::ContinuumMatrix, k::Real, J::AbstractVector) =
    [B[k, j] for j in J]

getindex(B::ContinuumMatrix, K::AbstractVector, J::AbstractVector) =
    [B[k, j] for k in K, j in J]

getindex(B::ContinuumMatrix, ::Colon, ::Colon) = copy(B)
getindex(B::ContinuumMatrix, ::Colon, J) = B[:, J]
getindex(B::ContinuumMatrix, K, ::Colon) = B[K, axes(B,2)]

# use lazy multiplication
materialize(M::Mul) = M
*(A::ContinuumArray, B::ContinuumArray) = materialize(Mul(A,B))

struct CAdjoint{T,PAR} <: ContinuumMatrix{T}
    parent::PAR
end

CAdjoint(A::ContinuumArray{T}) where T = CAdjoint{T, typeof(A)}(A)

parent(A::CAdjoint) = A.parent

axes(A::CAdjoint) = reverse(axes(parent(A)))
adjoint(A::ContinuumArray) = CAdjoint(A)
getindex(A::CAdjoint, k::Real, j::Real) = adjoint(parent(A)[j,k])

struct LinearSpline{T} <: ContinuumMatrix{T}
    points::Vector{T}
end

LinearSpline(p::Vector{T}) where T = LinearSpline{float(T)}(p)

axes(B::LinearSpline) = (first(B.points)..last(B.points), Base.OneTo(length(B.points)))

function getindex(B::LinearSpline{T}, x::Real, k::Int) where T
    x ∈ axes(B,1) && 1 ≤ k ≤ size(B,2)|| throw(BoundsError())

    p = B.points
    n = length(p)

    k > 1 && x ≤ p[k-1] && return zero(T)
    k < n && x ≥ p[k+1] && return zero(T)
    x == p[k] && return one(T)
    x < p[k] && return (x-p[k-1])/(p[k]-p[k-1])
    return (x-p[k+1])/(p[k]-p[k+1]) # x ≥ p[k]
end

getindex(B::LinearSpline, ::Colon, k::Int) = Mul(B, [Zeros{Int}(k-1); 1; Zeros{Int}(size(B,2)-k)])

function convert(::Type{SymTridiagonal}, AB::Mul{<:Any,<:Any,<:Any,<:CAdjoint{<:Any,<:LinearSpline},<:LinearSpline})
    Ac,B = AB.A, AB.B
    A = parent(Ac)
    @assert A.points == B.points
    x = A.points
    SymTridiagonal(x, x/2) # TODO fix
end

materialize(M::Mul{<:Any,<:Any,<:Any,<:CAdjoint{<:Any,<:LinearSpline},<:LinearSpline}) =
    convert(SymTridiagonal, M)

## tests

B = LinearSpline([1,2,3])

using Plots
x = range(1, stop=3, length=1000)
plot(B[x,:])

@test B'B isa SymTridiagonal

dlfivefifty commented 6 years ago

In this mockup, the inner product comes from the axes, which in this case a ClosedInterval, hence should always be Lebesgue. Not sure how to handle other inner products, but since this is no longer trying to replicate functional analysis, perhaps one could rely on:

struct ChebyshevWeight <: ContinuumVector{Float64} end
axes(::ChebyshevWeight) = (ChebyshevInterval(),)

struct CDiagonal{T,P} <: ContinuumMatrix{T}
   parent::P
end

axes(A::CDiagonal) = (axes(A.parent,1), axes(A.parent,1))

struct Chebyshev <: ContinuumMatrix{Float64} end
axes(::Chebyshev) = (ChebyshevInterval(), Base.OneTo(∞))

CDiagonal(ChebyhevWeight()) * Chebyshev()

daanhb commented 6 years ago

I started exploring options too, and ended up converging closer and closer to option 2 in the first commit here, except that I tried not to inherit from AbstractArray. This has advantages and disadvantages. The code is easier to read without references to internal data structures of Julia (I side with @MikaelSlevinsky that this is a potential maintenance problem). Also, making infinite arrays fit into Julia's linear algebra framework is a lot of work. On the other hand, not inheriting from AbstractArray means we have to implement everything over and over again: adjoints, lazy multiplication, block matrices, offset arrays, ... It would be really nice to be able to use everything other people have invented for arrays.

I think there is something fundamentally shady about indexing a matrix with a continuous index: getindex(a, i:Int, x) just looks weird. The analogy to linear algebra also stops at some point. For a matrix, you would expect B=A[1:5,4:6] to be a matrix of size 5 by 3. Here, B[1,1] equals A[1,4]. What if you slice a continuous dimension? If the second dimension of a is continuous, with x in some domain U, and if V is a subset of U, then consider this: D = C[1:3,V]. How do you index D? In D[1,y], what is the range of y: you can not just have y in V, because we did not do B[1,4] above either - we wrote B[1,1]. To be consistent, y would have to live in some shifted copy of V. What should the shift be? Not sure if I'm being clear, but in any case I think nothing can be gained in practice from having an infinite continuous index. Even stronger, I think it is simply wrong unless the geometry of the continuous index is heavily restricted (to, say, boxes).

The approach of option 2, where evaluation corresponds to multiplication by a Dirac delta function, is much nicer. There is a clean interpretation. Each element of the quasiarray is a function and its adjoint is a functional. You can make linear combinations of them, and this corresponds to multiplication by a coefficient vector on the right. You can act on the functions by multiplying by a functional or operator from the left: for example, function evaluation is multiplication (on the left) by a Dirac Delta.

Conceptually, perhaps it is nicer to have the quasivector be a column vector of functionals, rather than a row vector of functions. There is also still a problem with adjoints I think in case of a quasimatrix rather than a quasivector (you want the dot-product with a coefficient matrix, not the matrix-matrix product).

I think option 2 is about functions and function spaces. Implicitly, there is a function space, its dual space, and a duality pairing. Otherwise, the framework makes much less sense.

Now, is this added complexity all worth it, over some simple object that acts like a list of basis functions? What do we gain in practice from having this interpretation?

daanhb commented 6 years ago

I've put my code in pull request #2. Compared to option 3, this has a fancy UnitVector type to single out a basis function. Otherwise, apart from not inheriting from AbstractArray and not having Mul and Adjoint, it is quite comparable.

dlfivefifty commented 6 years ago

I take it this was posted before you read the ContinuumArrays.jl version?

daanhb commented 6 years ago

Yes

daanhb commented 6 years ago

Even if a continuous dimension can make sense, I still wonder what we gain from it. Is getindex(a, k, x) just an alias for eval_basis_function(basis, index, x) with shorter syntax? There would be a gain if we can reuse other (linear algebra) code, or if it results in a framework where you can create arbitrarily complicated scenarios based on simple building blocks with clear and obviously correct implementations. Where broadcast based on dispatch automatically chooses the fft to evaluate a Chebyshev expansion in Chebyshev points, for example.

Referring to my objection above: in the example code the domain of a function is encoded into the axes routine. What is the axes for LinearSpline{Float64}()[2.0..3.0,:]? I want to consider splines on a restricted part of their original domain. Here axes could simply return 2.0..3.0 for the first index, but then it is not consistent with linear algebra conventions (where the indices are shifted relative to those of the parent array). It could also be shifted: say axes yields 0.0..3.0-2.0 and getindex remembers to add 1.0 to x when passing to the parent array. But then it is not clear to me how this would work for not tensor-product domains. What would the shift be, and why is 0.0 a special point to shift to? (This shows my bias: I care about non-tensor product domains and about selecting subdomains, for extension frame problems :smile:)

I'm slightly more optimistic about being able to reuse generic linear algebra code if we have an object that represents a list of functionals. From my current understanding, which could be wrong, the math seems right. Even then, you are implicitly choosing one inner product and that one is somehow more important than other ones. Perhaps it is better to just make it more explicit: write gram_matrix(basis, ::ChebyshevWeight) (or mass_matrix), rather than using fancy adjoints. Make an object that represents a list of functions. Write routines so you can subindex it and make tensor product combinations. You know, what ApproxFun and BasisFunctions already do :smile:

dlfivefifty commented 6 years ago

You are missing the point of lazy Mul and adjoint: high performance is a priority and so we CANNOT construct matrices: the matrices are going to be distributed in the applications that @jagot has in mind. We therefore need to lazily populate the matrices.

dlfivefifty commented 6 years ago

Also Mul is a type from LazyArrays.jl that represents lazy multiplication between any objects, and is not <: AbstractArray. So there is really no difference in calling it f = Fun(Basis, coefficients) vs f = Mul(Basis, coefficients) (other then you get to say f.basis instead of f.A).

This is important when you get to operators. Operators would be naturally encoded as Mul(P, M, inv(Q)) where P and Q are bases and M is a lazy matrix. But in what sense is Fun special then? If they are both represented as Mul we have a nice unified language:

f = Mul(Chebyshev(), cfs)
M = _BandedMatrix((1:∞)', ∞, -1,1) # alternatively Hcat(Zeros(∞) Diagonal(1:∞))
D = Mul(Ultraspherical(1), M, inv(Chebyshev())
D * f === Mul(Ultraspherical(1), M*cfs)

where the last line is determined via something like

simplify(Mul(Ultraspherical(1), M, inv(Chebyshev()), Chebyshev(), cfs))

where simplify can tell inv(Chebyshev()), Chebyshev() is removable.

dlfivefifty commented 6 years ago

Here's a prototype of combing B splines with spherical harmonics for function approximation in the ball. It constructs the Laplacian and creates MPI Arrays.

This is of course a prototype, but essentially this should be working code with the final version.

I don't see how anything close to this clean is possible with functions like mass_matrix, etc.

SH = SphericalHarmonics()
Spl = BSplines(0.0:0.01:1)
S = SH ⊗ Spl

λ = @. -(1:∞)*((1:∞) +1)
L = BlockDiagonal(Diagonal.(Fill.(λ,1:2:∞))) # create a Block matrix with growing block sizes
Δ_s = Mul(SH, L, inv(SH)) # the Laplace–Boltrami operator
D_r = Mul(Spl, SymTridiagonal(...), inv(Spl))
R = Mul(Spl, Diagonal(Spl.points), inv(Spl))

Δ = I ⊗ (D_r*R^2*D_r) + Δ_s ⊗ I # creates Laplacian as a Mul(S, ..., inv(S)) operator

# finite_dimensional case
N = 100
S_N = Spl * SH[:, Block.(1:N)]  # take only first N degree spherical harmonics

Δ_N = Δ*S_N  # Knows that inv(S)*S_N === inv(S_N)

# R²Δ_N is a lazy BandedBlockBandedMatrix, with diagonal blockbandwidths
backend = BandedBlockBandedMatrix{Float64,MPIMatrix{Float64}}(undef, blocksizes(R²Δ_N), (0,0), bandwidths(D_r);
                        workers = ...) # distribute different blocks based on workers

MPI_Δ_N =  Mul(S_N, backend, inv(S_N))

MPI_Δ_N .= Δ_N # populates MPI array, generating data on workers remotely

f = LazyFun((x,y,z) -> cos(x*y)+z, S_N)  # not sure how constructors fit in yet...
MPI_f = Mul(S_N, BlockVector{Float64,MPIVector{Float64}}(undef, blocksizes(backend, 2); workers = ...))

MPI_f .= f # automatically generates data remotely, via MPI version of FastTransforms

MPI_v = similar(MPI_f)
MPI_v .= Mul(MPI_Δ_N, MPI_f) # applies the Laplacian and fills in missing information.

marcusdavidwebb commented 6 years ago

Having things like inv(Chebyshev()) assumes that there is only one way to construct a vector of coefficients from a function. This is certainly not true for frames. A continuum row vector representing a Fourier extension frame would send multiple vectors to the same function, so it is not invertible. A certain inverse must be selected, based on the algorithm by which coefficients are computed for a given function.

dlfivefifty commented 6 years ago

Discussed with @jagot and ContinuumArrays is looking like a winner: the axes can then incorporate the inner proxucts.

For frames use pinv.

dlfivefifty commented 6 years ago

@marcusdavidwebb for spectral problems the axes give the location of the spectrum while the operator is diagonal and just return D[x,x] == x for x in axes(D,1) (off diagonal returns zero unless you are outside the spectrum in which case it throws a BoundsError )

marcusdavidwebb commented 6 years ago

Sounds good, Sheehan. For FrameFun, which constructs functions by oversampled collocation, the axes could be defined to have an oversampled collocation inner product (with appropriate weights). However, for inverses of frames it gets complicated. How the coefficients of a function in a given frame are computed in FrameFun depends on the following factors: a dual frame (which is not unique), an epsilon for regularisation of the ill-conditioned system, and some oversampled collocation points. So the idea of an inverse of a continuum row vector which represents a frame appears to get difficult. Any thoughts on this from @daanhb ?

dlfivefifty commented 6 years ago

FYI Collocation should be super easy, this would generate rectangular collocation:

x = ChebyshevGrid(100; kind=2)
y = ChebyshevGrid(98; kind=1)
M = BandedMatrix(2 => 2(2:∞)) # creates superdiagonal second derivative ∞-matrix
D2 = Ultraspherical(2)[y,:] * M * pinv(Chebyshev()[x,:])

daanhb commented 6 years ago

You are missing the point of lazy Mul and adjoint: high performance is a priority and so we CANNOT construct matrices: the matrices are going to be distributed in the applications that @jagot has in mind. We therefore need to lazily populate the matrices.

Sure, but mass_matrix(b) could also return something lazy. It is semantically equivalent to writing adjoint(b)*b, except that it is arguably more explicit and more flexible since you can specify an alternative inner product. To be sure, I'm still in favour of the point of view closer to functional analysis, simply because that allows for established and well-defined general descriptions and that should help generic code. It should be clear how to implement a routine like mass_matrix in terms of underlying functionality, one example being mass_matrix(b) = adjoint(b)*b.

Conceptually, the game to play is to look for the simplest and most general building blocks, and then try and leverage broadcast or vector-like objects to arrive at efficient operators. Digging a little deeper, I would suggest to explore an implementation of bilinear and sesquilinear forms to start with. With those available, you can go on to express all variational formulations, inner products and even function evaluations that you like, based on dispatch: the bilinear form routine would have access to the types of the functions, the domain and any measure or weight function, and so lots of special cases can be implemented before resorting to quadrature.

This is not sufficient, there has to be some "vectorization" to make efficient matrix-free operators (as in the mass_matrix example). I think the point of view of a basis as vector-like is a clear winner here because it is so practical.

I still fail to see the gain of having continuous dimensions though. To me, when you write Legendre[i,x], what you really mean is Legendre[i](x). It is what it is: a vector of functions. If you don't want to construct an intermediate object, it is still equivalent to any other function like eval_basis_function(::Legendre, i, x). What is the benefit of using the getindex syntax for this purpose, especially if doing so precludes reusing existing linear algebra methods?

daanhb commented 6 years ago

Sounds good, Sheehan. For FrameFun, which constructs functions by oversampled collocation, the axes could be defined to have an oversampled collocation inner product (with appropriate weights). However, for inverses of frames it gets complicated. How the coefficients of a function in a given frame are computed in FrameFun depends on the following factors: a dual frame (which is not unique), an epsilon for regularisation of the ill-conditioned system, and some oversampled collocation points. So the idea of an inverse of a continuum row vector which represents a frame appears to get difficult. Any thoughts on this from @daanhb ?

I think the examples here are amazing and very powerful. It is probably true for most methods out there that a lot of complexity is hidden in the inv and pinv methods in these examples, but that is not a genuine problem. It is the way ApproxFun is designed that makes it so that writing inv just works. I'd be very happy if there were data structures and operators available so that we can specify what our various versions of pinv itself looks like, i.e., if the package provides a way for people to express and implement their own solvers.

What we could use I think is not just fast operators, but, since we have to apply them so many times, operators that can pre-allocate all they need. I.e., operators with a plan :smile:

dlfivefifty commented 6 years ago

I still fail to see the gain of having continuous dimensions

Because that's the linear algebra operations need to make sense. A * B makes sense provided axes(A,2) == axes(B,1). What's your proposal for axes(Legendre())? What's your proposed syntax for rectangular collocation? These are nice, clean, and consistent with the ContinuumArrays approach.

dlfivefifty commented 6 years ago

FYI https://github.com/JuliaArrays/AxisArrays.jl

daanhb commented 6 years ago

Thanks for the link - I'll have a look and perhaps I better read the paper referenced above again before commenting more. I guess I'd like to be convinced that it would work for tensors, and for domains of general shape (again, my bias).

dlfivefifty commented 6 years ago

I've renamed the package ContinuumArrays.jl. (This way we don't need to argue more about whether this is the way to do function expansions, but rather this is a way.)

Here is working BSplines code. Since I ported over Julia's code, the indexing is pretty robust.

In terms of maintainability: yes Julia's Base code may change. But this took me literally less than an hour to copy-and-paste-and-find-and-replace-and-delete. So the code is not meant to be maintainable but rather replaceable.

daanhb commented 6 years ago

Fair enough, I'm all in favour of exploring options - there is no best way anyway. I'll be following this package.

dlfivefifty commented 6 years ago

This now works for solving Helmholtz using linear splines and FEM:

L = LinearSpline(range(0,stop=1,length=20_000_000))
B = L[:,2:end-1] # Zero dirichlet by dropping first and last spline
D = Derivative(axes(L,1))

k = 10_000
@time A = -(B'D'D*B) + k^2*B'B # Weak Helmholtz, 9s
@time f = L*exp.(L.points) # project exp(x), 0.3s
@time u = B * (A \ (B'f)) # solution, 4s

# Compare with "exact" solution
using Plots, ApproxFun

x = Fun(axes(L,1))
u_ex = [Dirichlet(Chebyshev(0..1)); ApproxFun.Derivative()^2 + k^2*I] \ [[0,0], exp(x)]
@test u[0.1] ≈ u_ex(0.1) rtol = 1E-3 # true

The key idea that makes this work in an automated fashion is that operations have commuting relations: e.g. Derivative() * LinearSpline(pts) is equivalent to LinearSpline(pts) * banded for some simple banded matrix. It uses a simple "greedy" simplification so that B'D'D*B reduces to banded' * B'B * banded and then B'B is recognised as a mass matrix which reduces to a SymTridiagonal, leaving banded' * SymTridiagonal * banded, which then reduces to BandedMatrix. (Ideally it would also recognise this is symmetric, i.e., a Symmetric{T,<:BandedMatrix}.)

I'm pretty convinced that this is the cleanest syntax for something like this: the alternative of being explicit with functions like mass_matrix runs into the problem of an explosion of named operators, each with its own semantics. I'm not even sure what one would call B'D'D*B (other than weak_laplacian, but that kind of proves my point of requiring too many names). I faced this precise problem when wondering how to incorporate weak formulations in ApproxFun: things like adjoints are not intuitive with ApproxFun's notion of an operator, but are obvious with a quasi-matrix where they are dictated by the inner product as specified by the axes.

JuliaApproximation / ContinuumArrays.jl

Design of package #1