Strange interpolation at boundary

vladdez commented 2 years ago

Hi, I am plotting some EEG topographies together with @behinger. We observed strange interpolation at the top boundary. This is with the default interpolator ClaughTochter

DelaunyMesh has same issue

Any ideas? Has it to do with the fill-value? Cheers, Vladimir & Bene

palday commented 2 years ago

I suspect the problem would also occur at all the other boundaries if there were electrodes close to them. I'll investigate a bit.

yakir12 commented 2 years ago

I think that the main problem here is that of extrapolation: How does one correctly extrapolate from data in two or three dimensional irregular coordinates?

The approach taken here is to add an outer perimeter that encloses the known data. This perimeter can have a certain geometry (shape), padding (how far out does it stretch outside the known coordinates), pad_value (defaults to zero), and number of discrete data points it has. All of these parameters influence the "quality" of the extrapolation. A perimeter with low padding (i.e. close to the data), low pad value (relative to the data), and densely populated will "push away" hot spots from their original coordinates, which is what we're seeing here.

I therefore suspect that datasets with a relatively low number of points will suffer more than denser datasets.

I managed to improve on some examples by choosing values that are different to the defaults. However, no set of values worked for all of my examples (i.e. I did not find globally better defaults).

One expensive idea I imagine could work is if we used some diffusion model to allow the data to diffuse outwards freely and then crop the result with some shape (i.e. geometry). This seems to me like the most natural type of extrapolation.

For context, neither Interpolations.jl ~~nor Dierckx.jl tackle 2D extrapolation of irregularly spaced coordinates,~~ so this is not a simple problem.

behinger commented 2 years ago

that makes a lot of sense - iirc MNE sets the padding values to the closest electrode, instead of having the same value for all padded locations (not sure exactly how padding works here).

I could not set padding=0. or padding=0.2, I could only change the pad_value. Do you know why?

yakir12 commented 2 years ago

sets the padding values to the closest electrode

This would be an improvement, but I suspect it won't always work. I wonder how do they manage to end up with a circle/ellipse using that rule..?

I could not set padding=0. or padding=0.2, I could only change the pad_value. Do you know why?

I suspect it's because there is a depreciation check to warn about that specific keyword that has been used before for padding figures (actually Scenes) from within the plotting function. I see the same issue (unrelated to this specific issue though). It's worth filing an issue separately and I'm sure it would be appreciated if you did so 😄

behinger commented 2 years ago

Actually, I missremembered the exact comment. They were planning to do that, but now just put it to the mean value as far as I can see: https://github.com/mne-tools/mne-python/blob/9e4a0b492299d3638203e2e6d2264ea445b13ac0/mne/viz/topomap.py#L642

They either put the extra points on the convex hull, or on the circle of the plot.

EEGlab uses griddata internally which simply returns NaNs for extrapolated values.

Proposed solution: Put pad_value = mean(data) by default. (havent tried it)

Regarding the issue, I wrote it on my list, but will soon be on parental leave, so not sure it will happen..

behinger commented 2 years ago

I fixed the padding bug, by renaming it to enlarge, padding seems to be somewhat protected, thus als no issue on makie.jl

Just in case here was my finished MWE, before I thought that padding might be protected somehow @SimonDanisch might immediately know whether it is worth to post to Makie.jl or not.

    @recipe(TestPlot,data) do scene
    return Attributes(
        padding = 0.1
    )
end
function Makie.plot!(p::TestPlot)
    plot(p.data)
end
    testplot([1,2,3],padding=5)

SimonDanisch commented 2 years ago

Ah yeah, I guess that was an old, deprecated scene attribute ...

yakir12 commented 2 years ago

A really easy cheat we could use to avoid the original issue here is to treat the raw data as a zero-image where the pixels closest to the coordinates are equal to the coordinates' corresponding values. Then convolve the image with say a gaussian with large enough variance. The result can then either be the end result of the topology, or can be used as the input data to the topology. That's actually really simple.

behinger commented 2 years ago

@ykair12 (I hope I got the idea) while possible, I don't think that would be a great interpolator, if variance is too small, every electrode is simply a "bump", if variance is too large, we get mean value everywhere. Thus each electrode distance would require their specific variance specification - if the distance is too large, we would automatcally interpolate with 0 . IIRC, something very similar is implemented via #5 ScatteredInterpolation.jl directly as an interpolator via Gaussian

Maybe I understood it wrongly, how would that solve the boundary issue? In the boundary we would interpolate towards 0 still - correct?

yakir12 commented 2 years ago

I agree that we would need to choose a good-enough variance, just like a smoothing factor. But unlike introducing novel zero points at the perimeter that "push away" peaks from their original coordinates, what I suggest will keep peaks at their original locations.

yakir12 commented 2 years ago

Here is an implementation of my suggestion:

using SparseArrays, Statistics, LinearAlgebra
using GLMakie, ImageFiltering, Delaunay, Interpolations
function toindex(x, r, n)
    x += r
    x /= 2r/(n - 1)
    return round(Int, x + 1)
end
function sparseimg(v, cxy, r, n)
    i = toindex.(first.(cxy), r, n)
    j = toindex.(last.(cxy), r, n)
    return collect(sparse(i, j, v, n, n))
end
function plottopo(v, xy; buff = 0.05, l = 13, σ_factor = 8)
    n = 2l + 1
    c = mean(xy)
    xy .-= c
    r = maximum(norm, xy)
    r *= 1 + buff
    img0 = sparseimg(v, xy, r, n)
    imgw = imfilter(img0, Kernel.gaussian(n/σ_factor))
    xs = range(-r, r, n) .+ c[1]
    ys = range(-r, r, n) .+ c[2]
    #
    keep = [ij for ij in CartesianIndices((n, n)) if norm(Tuple(ij) .- l .- 1) ≤ l]
    #
    data = imgw[keep]
    positions = Point2.(xs, ys')[keep]
    itp = LinearInterpolation((xs, ys), imgw)
    for θ in range(0, 2π, 51)[1:end-1]
        p = c + r * Point2(reverse(sincos(θ)))
        push!(positions, p)
        push!(data, itp[p...])
    end
    #
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    #
    fig = Figure()
    ax = Axis(fig[1,1], aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy .+ c, color = v, colormap = :reds)
    fig
end
using TopoPlots
data, positions = TopoPlots.example_data()
plottopo(vec(data[:,6,3]), positions)

Which looks like this: tmp

Now it's just the matter of integrating this into the existing code here. I'll start a draft PR.

yakir12 commented 2 years ago

I found an even better shortcut: instead of convolving gaussian filters with a sparse image, I can just compose a mixture model from Distributions.jl! Much much cheaper:

using Statistics, LinearAlgebra
using GLMakie, Delaunay, Distributions
function random_disk_points(c, n, r)
    positions = fill(c, n)
    for i in 1:n
        R = r * sqrt(rand())
        θ = 2π * rand()
        positions[i] += Point2(R .* sincos(θ))
    end
    return positions
end
function get_perimeter(c, n, r)
    positions = fill(c, n)
    for (i, θ) in pairs(range(0, 2π, n + 1)[1:end-1])
        positions[i] += Point2(r .* sincos(θ))
    end
    return positions
end
function plottopo(v, xy; buff = 0.05, σ_factor = 5, n = 200)
    c = mean(xy)
    r = maximum(norm, xy .- c)
    r *= 1 + buff
    positions = copy(xy)
    random = random_disk_points(c, n, r)
    perimeter = get_perimeter(c, 50, r)
    append!(positions, random, perimeter)
    σ = r/σ_factor
    p = v .- minimum(v)
    p /= sum(p)
    a = MixtureModel(MvNormal, tuple.(xy, σ), p)
    data = pdf.(Ref(a), positions)
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    fig = Figure()
    ax = Axis(fig[1,1], aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy, color = v, colormap = :reds)
    fig
end
using TopoPlots
data, xy = TopoPlots.example_data()
v = vec(data[:,6,3])
fig = plottopo(v, xy)

tmp

behinger commented 2 years ago

cool! could you plot data[:,340,1] instead?

One Q: Maybe I missunderstand something, but isnt there the same artefact as in the previous interpolators?

yakir12 commented 2 years ago

cool! could you plot data[:,340,1] instead?

Sure: tmp

One Q: Maybe I missunderstand something, but isnt there the same artefact as in the previous interpolators?

No. The points along the perimeter in TopoPlots.jl are all zero-valued, and therefore affect the resulting gradient. The points along the perimeter here are calculated from the mixture model and are therefore closer to the data we have (without the need to assume that they fall to absolute zero exactly at the perimeter).

Here is an example where the standard deviation is much smaller:

fig = plottopo(v, xy; σ_factor=20, n=10000)

You can see that the peaks match the data exactly: tmp

behinger commented 2 years ago

Ah. I was confused about the scatter colorscale.

I tried playing around with your solution, not sure whats going on, this is how it should look like (data340) grafik

could it be that negative values are "removed" due to the v .- minimum(v)?

yakir12 commented 2 years ago

Ah. I was confused about the scatter colorscale.

Yeah, I just set the color to the value of the specific point so that I'll better understand the contribution of each point to the resulting gradient.

Late last night, I realised why you asked me to try data[:,340,1] and how my implementation (with the mixture model) was wrong! Instead of using the Probability Density Function of a mixture model, I should use the sum of the Gaussian functions directly.

Here I define a gaussian function with a set standard deviation, a mean equal to the coordinate of the data, and a maximum equal to the data's "height" or value. The sum of all the points' gaussians describes the topology of the data.

Here is the implementation and result.

using Statistics, LinearAlgebra
using GLMakie, Delaunay, Distributions
iso_2d_gaussian(μx, μy, σ, v, x, y) = v*exp(-((x - μx)^2 + (y - μy)^2)/2σ^2) 
iso_2d_gaussian(μ, σ, v, xy) = iso_2d_gaussian(μ..., σ, v, xy...)
function random_disk_points(c, n, r)
    positions = fill(c, n)
    for i in 1:n
        R = r * sqrt(rand())
        θ = 2π * rand()
        positions[i] += Point2(R .* sincos(θ))
    end
    return positions
end
function get_perimeter(c, n, r)
    positions = fill(c, n)
    for (i, θ) in pairs(range(0, 2π, n + 1)[1:end-1])
        positions[i] += Point2(r .* sincos(θ))
    end
    return positions
end
function plottopo(v, xy; buff = 0.05, σ_factor = 5, n = 200)
    c = mean(xy)
    r = maximum(norm, xy .- c)
    r *= 1 + buff
    positions = copy(xy)
    random = random_disk_points(c, n, r)
    perimeter = get_perimeter(c, 50, r)
    append!(positions, random, perimeter)
    σ = r/σ_factor
    fun = x -> sum(zip(xy, v)) do (μ, h)
        iso_2d_gaussian(μ, σ, h, x)
    end
    data = fun.(positions)
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    fig = Figure()
    ax = Axis(fig[1,1])#, aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy, color = :transparent, strokecolor = :black, strokewidth = 1, markersize = 5)#v, colormap = :reds)
    fig
end
using TopoPlots
data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])
fig = plottopo(v, xy)

tmp

I think that while this is better than introducing tons of zeros (or pad_values) at the perimeter, it still suffers from the heavy-handed smoothing...

behinger commented 2 years ago

I guess it works well for high density caps, and maybe less so for 10/20 caps? I gess that is true for all interpolation schemes.

I think it is worth to look into the MNE scheme how they do it (linked above), and what griddata in Matlab does

yakir12 commented 2 years ago

OK, what a loop. I was wrong about a few things, most importantly, Dierckx.jl does indeed tackle 2D extrapolation of irregularly spaced coordinates! Following is a really simple implementation that seems to work well (although there are some linear artefacts in the extrapolation).

There are two instances where new points need to be added to the data:

along the perimeter
gridded data for plotting the contour lines

While the interpolator (i.e. Dierckx's spline in the following case) is used to generate points for the second instance, the first instance just uses zeros (i.e. pad_value) in the current implementation of TopoPlots.jl. Here, I use the interpolator to generate points for the perimeter as well.

My next step is to try and use the (superior?) ClaughTochter interpolator.

using GLMakie
using TopoPlots

using Statistics, LinearAlgebra
using Dierckx, Delaunay

# get the data 
data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])

# some parameters
buff = 0.05
kx = ky = 2
s = 0.5
resolution = 512

# create the interpolating spline
itp = Spline2D(first.(xy), last.(xy), v; kx, ky, s)

# create perimeter
c = mean(xy)
r = maximum(norm, xy .- c)
r *= 1 + buff
perimeter_xy = decompose(Point2f, Circle(c, r))
perimeter_v = evaluate(itp, first.(perimeter_xy), last.(perimeter_xy))

# add the perimeter to the data
positions = vcat(xy, perimeter_xy)
data = vcat(v, perimeter_v)

# delaunay triangulation for the mesh
m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));

# gridded data for the contour
xl = range(c[1] - r, c[1] + r, resolution)
yl = range(c[2] - r, c[2] + r, resolution)
z = evalgrid(itp, xl, yl)
for (i, x) in pairs(xl), (j, y) in pairs(yl)
    if norm(Point2f(x, y) - c) > r
        z[i, j] = 0 # could be set to NaN too...
    end
end

# plot it all
fig = Figure()
ax = Axis(fig[1,1], aspect = DataAspect())
mesh!(ax, m.points, m.simplices, color = data)
contour!(ax, xl, yl, z, levels=6)
scatter!(ax, xy, color = :transparent, strokecolor = :black, strokewidth = 1, markersize = 5)

tmp

yakir12 commented 2 years ago

The saga continues...

I found that given some data "within a circle", such as the case with EEG, the extrapolation problem is easiest to solve if we just set the 4 corners of the enclosing square to zero. So by adding 4 new coordinates to the original data, one coordinate at each corner of the square that encloses the (enlarged) circle that itself encloses all the original coordinates, and by setting those 4 points to zero, we solve the whole thing. We now have our original data plus 4 zero corners, and now we can interpolate within this square all the data we need (i.e. the perimeter of the circle for the Delaunay mesh plot, and the gridded data within that circle for the contour plot). The only assumption is that data at the corners of the enclosing square is equal to zero (or some user defined value).

In the case of rectangular data, enlarging/extrapolating the data is dubious to begin with, but the same principals apply here as well.

using Statistics, LinearAlgebra
using SciPy, Delaunay

using GLMakie
using TopoPlots

function get_circle(xy; buff = 0.1)
    c = mean(xy)
    r = (1 + buff)*maximum(x -> norm(x - c), xy)
    return c, r
end

function get_boundingbox(c, r)
    bb_xy = Point2f[c + r*Point2f(sx, sy) for sx in (-1, 1) for sy in (-1, 1)]
    bb_v = zeros(4)
    return bb_xy, bb_v
end

function get_interpolator(xy, v, c, r; tol = 1e-6, maxiter = 400, rescale = false)
    # get the bounding box
    bb_xy, bb_v = get_boundingbox(c, r)
    append!(bb_xy, xy)
    append!(bb_v, v)

    # interpolate within the bounding box
    itp = SciPy.interpolate.CloughTocher2DInterpolator(Tuple.(bb_xy), bb_v; tol, maxiter, rescale)
    return itp
end

function plottopo(v, xy)
    # find enclosing circle
    c, r = get_circle(xy)
    itp = get_interpolator(xy, v, c, r)

    # gridded data
    resolution = 100
    xl = range(c[1] - r, c[1] + r, length = resolution)
    yl = range(c[2] - r, c[2] + r, length = resolution)
    z = collect(itp(xl' .* ones(length(yl)), ones(length(xl))' .* yl)')

    # set interpolated data outside enclosing circle to NaN
    incircle = [norm(Point2f(x, y) - c) ≤ r for (i, x) in pairs(xl), (j, y) in pairs(yl)]
    z[(!).(incircle)] .= NaN

    # add bounding geometry to the data
    cxy = decompose(Point2f, Circle(c, r))
    cv = only.(itp.(cxy))
    xyl = Point2f.(xl, yl')
    append!(cxy, xyl[incircle])
    append!(cv, z[incircle])

    # delaunay triangulation for the mesh
    m = delaunay(convert(Matrix{Float64}, hcat(first.(cxy), last.(cxy))));

    # plot it all
    fig = Figure()
    ax = Axis(fig[1,1])
    mesh!(ax, m.points, m.simplices, color = cv, colormap = Reverse(:RdBu), shading=false)
    contour!(ax, xl, yl, z, levels=6, color=(:black, 0.5), linestyle=:dot)
    scatter!(ax, xy, color = v, colormap = Reverse(:RdBu), strokecolor = :black, strokewidth = 1, markersize = 5)
    return fig
end

and

data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])
plottopo(v, xy)

results in tmp

MakieOrg / TopoPlots.jl

Strange interpolation at boundary #9