JuliaEarth / GeoStats.jl

An extensible framework for geospatial data science and geostatistical modeling fully written in Julia
https://juliaearth.github.io/GeoStatsDocs/stable/
MIT License
506 stars 60 forks source link

[Help wanted] Kriging on the globe #62

Closed briochemc closed 4 years ago

briochemc commented 4 years ago

I have global (as in, spanning the globe) rock data that is sparse, given as 3 vecrtors (longitudes x, latitudes y, and values z), and I want to try to use some Kriging to interpolate/extrapolate these onto a global grid. Here is my current code:

using GeoStats, Plots, Distances
x = [81.45,81.42,83.35,85.24,89.51,91.01,93.05,93.07,96.04,94.09,95.03,310.0,31.0,324.0,301.0,285.0,12.0,359.0,90.0,67.0,236.0,284.0,271.0,292.0,123.0,225.0,107.0,237.0,126.0,126.0,127.5,129.0,129.0,12.56,13.23,10.23,303.0,307.0,314.0,301.0,305.0,312.0,322.0,194.0,192.0,191.0,177.0,352.99,349.75,343.61,342.84,341.85,342.83,341.43,338.93,346.0,341.7,335.5,342.5,9.0,358.55,355.75,351.45,347.79,346.92,344.42,342.73,343.7,342.0,347.65,155.39,153.29,154.08,153.45,152.0,152.2,139.17,310.44,310.52,312.0,309.93,310.47,352.41,314.85,299.72,12.36,255.64,214.41,121.32,130.58,254.7,278.76,103.07,269.0,92.0,110.0,119.0,354.0,342.0,350.0,110.0,228.3,276.1,264.4,161.4,240.82,302.0,130.13,107.11,106.43,88.57,353.0,341.5,337.92,344.94,246.4,286.27,300.36,299.65,300.63,300.42,311.6,332.0,338.0,353.0,332.0,5.25,352.0,6.8,346.01,326.0,333.0,284.0,288.0,106.4,49.2,182.13,182.7,183.18,277.0,73.3,60.0,28.0,23.0,136.0,43.0,124.2,309.5,286.45,286.6,289.76,358.0,132.65,115.63,115.4,119.0,145.05,275.0,265.0,262.66,149.3,44.0,49.0,150.0,290.0,288.0,168.0,241.0,226.0,34.5,301.0,289.7,121.9,288.0,326.0,332.0,333.0,160.0,280.0,85.0,33.0,127.0,140.0,170.0,150.0,106.0,122.0,298.0,69.0,180.0,147.0,43.0,40.0,140.0,145.0,83.0,171.0,115.0,25.0,323.0,55.5,356.8,352.2,351.3,349.5,346.25,351.5,35.0,5.0,12.0,36.0,36.2,41.0,42.5,48.0,43.15,314.58,304.68,324.95,319.5,287.88,300.0,8.0,132.0,132.0,128.0,146.15,172.4,289.1,38.85,32.6,32.87,18.45,13.28,11.3,2.9,280.0,240.0,170.0,120.0,60.0,20.0,320.0,37.0]
y = [9.39,11.32,14.42,16.38,18.28,18.21,17.11,14.3,12.46,11.45,10.12,0.0,32.0,-11.0,-34.5,11.2,-6.0,53.0,21.0,24.0,46.3,40.3,30.0,49.0,31.0,70.0,9.0,37.5,37.0,35.0,34.5,35.0,37.0,-29.42,-25.36,-14.53,64.0,62.0,60.0,59.0,55.0,59.5,60.0,54.0,54.0,54.0,53.0,34.2,32.24,25.02,23.44,21.2,19.29,13.5,4.55,28.0,17.0,12.5,8.0,42.0,34.41,31.58,28.59,22.44,20.32,18.09,14.38,14.08,14.0,11.43,50.51,48.59,49.56,48.23,47.0,-31.5,-35.1,65.07,65.1,7.5,65.1,65.08,45.04,43.29,-52.55,-12.12,18.13,54.37,2.42,36.39,19.06,-7.35,-7.21,13.0,5.0,7.0,23.0,3.0,14.0,36.0,35.0,46.3,-3.6,4.1,46.3,47.09,-35.0,-9.06,-8.25,-8.7,-6.02,62.0,63.4,60.57,46.3,27.0,-16.92,14.72,13.03,12.55,11.03,3.4,69.0,65.0,62.0,78.0,60.5,71.0,65.0,72.9,67.0,72.3,-46.2,-38.0,-7.9,-67.37,-18.57,-18.5,-20.13,3.0,0.4,-10.0,-32.0,-33.0,-36.0,15.0,-8.3,13.25,-49.75,-52.33,-54.95,58.0,-1.2,-33.32,-34.42,-20.0,-4.08,10.0,13.0,19.83,-5.0,-12.0,-19.0,-5.0,-65.0,-66.0,-15.0,37.62,53.0,-20.0,-63.5,-23.85,20.25,-42.0,65.0,70.5,71.0,-10.0,13.0,-59.0,-27.0,-8.0,35.0,54.0,10.0,-6.0,-10.0,16.0,-49.0,-80.0,-43.0,12.0,20.0,27.0,15.0,-65.0,-45.0,25.0,67.0,66.0,-21.0,36.1,36.8,35.8,45.4,56.5,47.5,26.0,43.0,45.0,37.0,37.0,17.5,11.6,12.0,11.6,-60.95,-66.05,-74.4,-77.15,69.0,66.5,68.0,-4.0,-7.0,-9.0,-17.3,-41.0,-69.5,-23.37,-31.5,-29.63,-35.78,-30.58,-25.5,-50.58,-70.0,-70.0,-70.0,-65.0,-65.0,-70.0,-75.0,25.0]
z = [-16.2,-16.1,-13.9,-15.1,-12.6,-12.8,-8.6,-9.3,-11.3,-9.2,-10.0,-9.2,-3.3,-12.9,-10.3,-8.3,-16.1,-13.5,-15.7,-12.2,-6.6,-11.3,-10.9,-5.3,-10.7,-14.3,-9.5,-3.6,-16.9,-13.2,-18.4,-13.4,-23.9,-11.0,-12.2,-18.8,-27.5,-24.1,-16.5,-17.0,-27.7,-8.8,-7.9,7.4,7.6,7.2,8.5,-11.7,-13.0,-15.3,-14.4,-14.8,-14.3,-15.7,-18.6,-12.8,-14.6,-12.0,-14.4,-13.0,-11.8,-13.6,-17.1,-17.8,-17.9,-17.1,-16.2,-14.3,-14.5,-12.1,7.6,7.3,7.7,9.6,10.1,-1.3,-6.0,-33.0,-24.2,-10.2,-34.3,-26.1,-9.9,-24.5,-4.4,-24.9,4.2,1.6,4.2,-9.3,1.3,-5.0,-13.8,2.0,-13.8,-10.6,-9.6,-9.1,-13.6,-8.5,-9.7,-6.1,-4.3,-3.4,-0.5,-6.0,-9.0,-9.3,-3.1,-2.8,-7.7,-6.5,7.6,3.4,-11.6,7.6,-3.2,-9.1,-9.6,-11.6,-13.3,-10.7,4.0,8.0,7.0,-13.0,-14.0,5.0,-14.0,-13.7,-38.0,-29.8,5.6,2.5,-2.6,-38.2,9.2,8.8,7.5,9.8,6.4,7.3,-9.2,-9.5,-21.1,4.1,-1.9,-12.0,-1.4,1.1,9.8,-10.6,4.1,1.7,-4.6,-15.1,7.1,7.3,5.9,1.5,8.3,3.4,-22.1,8.3,1.1,-3.2,7.4,-0.5,9.6,-14.5,5.7,5.5,-2.8,2.1,-38.0,4.0,-29.8,1.8,4.1,-0.5,-30.0,-12.4,7.7,7.4,7.4,0.0,-3.3,5.5,0.0,-20.8,-5.2,5.0,5.0,2.3,6.7,-8.0,-5.3,-10.2,-19.4,-26.0,4.0,-10.1,-8.4,-11.8,-11.6,-11.1,-11.6,-9.1,-9.7,-10.8,-6.2,-6.3,-4.2,6.2,10.0,6.3,-6.2,-5.7,-7.7,-10.4,-23.1,-25.8,-15.1,-8.7,-8.1,-9.5,-6.3,1.2,5.5,-15.5,-13.4,-15.3,-11.2,-11.6,-11.6,-4.8,-4.1,-3.7,-6.8,-14.3,-18.8,-14.0,-8.0,-11.2]
# list of properties with coordinates
props = OrderedDict(:prop => z)
coord = [(lon,lat) for (lon,lat) in zip(x,y)]
# define spatial data
sdata = PointSetData(props, coord)
# the solution domain (basically a grid covering the entire globe)
sdomain = begin
    dims = (180, 91)
    start = (1.0, -89.01098901098901)
    finish = (359.0, 89.01098901098901)
    RegularGrid(start, finish, dims=dims)
end
# define estimation problem for any data column(s) (e.g. :precipitation)
problem = EstimationProblem(sdata, sdomain, :prop)
# choose a solver from the list of solvers
solver = Kriging(
    :prop => (variogram=GaussianVariogram(range=35.),
              distance=Haversine(1.0)) # use the Haversine distance because the earth is round
)
# solve the problem
solution = GeoStats.solve(problem, solver)
# plot the solution
plt2 = contourf(solution)
plt1 = scatter(sdata)
plot(plt1, plt2, layout=(2,1))
Screen Shot 2020-06-03 at 10 30 11 am

I was expecting the solution (prop mean) to look like the data (prop), but I'm probably making a mistake along the way. (First time trying your package.) I was put off by issue #61 as well (origin and spacing not used in plotting), but I don't think that's my only issue here. Could you help me figure this out?

juliohm commented 4 years ago

I think this is an issue with the radius you chose for the Haversine distance. The Earth radius is approximately 6371km and I think when I added the Haversine to Distances.jl they didn't accept hard-coding this constant in the code. So you should create a Haversine(6371) for more meaningful results:

solver = Kriging(
    :prop => (variogram=GaussianVariogram(range=35.),
              distance=Haversine(6371)) # use the Haversine distance because the earth is round
)

This is what I get wit this change:

image

The next step is to make sure that the range of the variogram makes sense. Did you have a chance to fit a variogram model to the data? Make sure that the range makes sense and that you are using Haversine there as well with the appropriate radius.

I've plotted the problem and the points seem to be well distributed in the grid:

plot(problem)

image

juliohm commented 4 years ago

In any case, I'd like to say that I don't have much experience with global scale interpolation. We may be hitting some numerical issue due to the Haversine or due to some other related reason with spherical coordinates. Let's first make sure that the parameters we are passing to the solver make sense. You could also try another interpolator just for debugging purposes like InverseDistanceWeighting.jl or LocallyWeightedRegression.jl

briochemc commented 4 years ago

I have no experience with geostats or Kriging 😅 , so I am at a loss here as to how to proceed... I'm just looking for something that makes sense and looks nice, not for a state-of-the-art best fit that would pass strict validation of experts.

For instance, even removing entirely the Haversine distance part, I get a similar pattern. So my guess is that this has to do more with the choice of range in the GaussianVariogram? What do you think? What if this data was just local data on a small region... Which parameters would you use to fit that data?

(Note that I do want to use the distance because it removes the distortion introduced by the lat/lon projection, and it also naturally works with cyclical longitudes if I understand correctly.)

juliohm commented 4 years ago

I can investigate the issue more closely tomorrow. I think we could try passing a neighborhood to the Kriging solver so that it doesn't try to estimate points very far away from observations. When you pass this option, the Kriging becomes local and the predictions are made inside a moving neighborhood.

One thing that may be happening is that when you reach a distance that is too large (global scale), the correlation is gone (you are at the tail of the variogram) so basically we get this very unstable, erratic interpolation. The neighborhood option may solve it, but if doesn't address the issue, we need to investigate more deeply what else may be happening.

briochemc commented 4 years ago

BTW I think I will go with the InverseDistanceWeighting approach with this PR for Haversine to work 😃

juliohm commented 4 years ago

In any case I think it would be a good idea to investigate what is happening with the Kriging solution in this case. I will leave it open for some time. Will try to review and merge the PRs by the end of the day. :)

juliohm commented 4 years ago

@briochemc do you mind sharing screenshots of the updated results with InvDistWeight? That can help debug this further as well. I will try to look into it when I find time. Also check the comments I left in your PR in InverseDistanceWeighting.jl Thanks for the help again.

briochemc commented 4 years ago

Here is the updated result

Screen Shot 2020-06-04 at 11 03 12 am

that came from this code:

using GeoStats, Plots, InverseDistanceWeighting, Distances
# data
x = [81.45,81.42,83.35,85.24,89.51,91.01,93.05,93.07,96.04,94.09,95.03,310.0,31.0,324.0,301.0,285.0,12.0,359.0,90.0,67.0,236.0,284.0,271.0,292.0,123.0,225.0,107.0,237.0,126.0,126.0,127.5,129.0,129.0,12.56,13.23,10.23,303.0,307.0,314.0,301.0,305.0,312.0,322.0,194.0,192.0,191.0,177.0,352.99,349.75,343.61,342.84,341.85,342.83,341.43,338.93,346.0,341.7,335.5,342.5,9.0,358.55,355.75,351.45,347.79,346.92,344.42,342.73,343.7,342.0,347.65,155.39,153.29,154.08,153.45,152.0,152.2,139.17,310.44,310.52,312.0,309.93,310.47,352.41,314.85,299.72,12.36,255.64,214.41,121.32,130.58,254.7,278.76,103.07,269.0,92.0,110.0,119.0,354.0,342.0,350.0,110.0,228.3,276.1,264.4,161.4,240.82,302.0,130.13,107.11,106.43,88.57,353.0,341.5,337.92,344.94,246.4,286.27,300.36,299.65,300.63,300.42,311.6,332.0,338.0,353.0,332.0,5.25,352.0,6.8,346.01,326.0,333.0,284.0,288.0,106.4,49.2,182.13,182.7,183.18,277.0,73.3,60.0,28.0,23.0,136.0,43.0,124.2,309.5,286.45,286.6,289.76,358.0,132.65,115.63,115.4,119.0,145.05,275.0,265.0,262.66,149.3,44.0,49.0,150.0,290.0,288.0,168.0,241.0,226.0,34.5,301.0,289.7,121.9,288.0,326.0,332.0,333.0,160.0,280.0,85.0,33.0,127.0,140.0,170.0,150.0,106.0,122.0,298.0,69.0,180.0,147.0,43.0,40.0,140.0,145.0,83.0,171.0,115.0,25.0,323.0,55.5,356.8,352.2,351.3,349.5,346.25,351.5,35.0,5.0,12.0,36.0,36.2,41.0,42.5,48.0,43.15,314.58,304.68,324.95,319.5,287.88,300.0,8.0,132.0,132.0,128.0,146.15,172.4,289.1,38.85,32.6,32.87,18.45,13.28,11.3,2.9,280.0,240.0,170.0,120.0,60.0,20.0,320.0,37.0]
y = [9.39,11.32,14.42,16.38,18.28,18.21,17.11,14.3,12.46,11.45,10.12,0.0,32.0,-11.0,-34.5,11.2,-6.0,53.0,21.0,24.0,46.3,40.3,30.0,49.0,31.0,70.0,9.0,37.5,37.0,35.0,34.5,35.0,37.0,-29.42,-25.36,-14.53,64.0,62.0,60.0,59.0,55.0,59.5,60.0,54.0,54.0,54.0,53.0,34.2,32.24,25.02,23.44,21.2,19.29,13.5,4.55,28.0,17.0,12.5,8.0,42.0,34.41,31.58,28.59,22.44,20.32,18.09,14.38,14.08,14.0,11.43,50.51,48.59,49.56,48.23,47.0,-31.5,-35.1,65.07,65.1,7.5,65.1,65.08,45.04,43.29,-52.55,-12.12,18.13,54.37,2.42,36.39,19.06,-7.35,-7.21,13.0,5.0,7.0,23.0,3.0,14.0,36.0,35.0,46.3,-3.6,4.1,46.3,47.09,-35.0,-9.06,-8.25,-8.7,-6.02,62.0,63.4,60.57,46.3,27.0,-16.92,14.72,13.03,12.55,11.03,3.4,69.0,65.0,62.0,78.0,60.5,71.0,65.0,72.9,67.0,72.3,-46.2,-38.0,-7.9,-67.37,-18.57,-18.5,-20.13,3.0,0.4,-10.0,-32.0,-33.0,-36.0,15.0,-8.3,13.25,-49.75,-52.33,-54.95,58.0,-1.2,-33.32,-34.42,-20.0,-4.08,10.0,13.0,19.83,-5.0,-12.0,-19.0,-5.0,-65.0,-66.0,-15.0,37.62,53.0,-20.0,-63.5,-23.85,20.25,-42.0,65.0,70.5,71.0,-10.0,13.0,-59.0,-27.0,-8.0,35.0,54.0,10.0,-6.0,-10.0,16.0,-49.0,-80.0,-43.0,12.0,20.0,27.0,15.0,-65.0,-45.0,25.0,67.0,66.0,-21.0,36.1,36.8,35.8,45.4,56.5,47.5,26.0,43.0,45.0,37.0,37.0,17.5,11.6,12.0,11.6,-60.95,-66.05,-74.4,-77.15,69.0,66.5,68.0,-4.0,-7.0,-9.0,-17.3,-41.0,-69.5,-23.37,-31.5,-29.63,-35.78,-30.58,-25.5,-50.58,-70.0,-70.0,-70.0,-65.0,-65.0,-70.0,-75.0,25.0]
z = [-16.2,-16.1,-13.9,-15.1,-12.6,-12.8,-8.6,-9.3,-11.3,-9.2,-10.0,-9.2,-3.3,-12.9,-10.3,-8.3,-16.1,-13.5,-15.7,-12.2,-6.6,-11.3,-10.9,-5.3,-10.7,-14.3,-9.5,-3.6,-16.9,-13.2,-18.4,-13.4,-23.9,-11.0,-12.2,-18.8,-27.5,-24.1,-16.5,-17.0,-27.7,-8.8,-7.9,7.4,7.6,7.2,8.5,-11.7,-13.0,-15.3,-14.4,-14.8,-14.3,-15.7,-18.6,-12.8,-14.6,-12.0,-14.4,-13.0,-11.8,-13.6,-17.1,-17.8,-17.9,-17.1,-16.2,-14.3,-14.5,-12.1,7.6,7.3,7.7,9.6,10.1,-1.3,-6.0,-33.0,-24.2,-10.2,-34.3,-26.1,-9.9,-24.5,-4.4,-24.9,4.2,1.6,4.2,-9.3,1.3,-5.0,-13.8,2.0,-13.8,-10.6,-9.6,-9.1,-13.6,-8.5,-9.7,-6.1,-4.3,-3.4,-0.5,-6.0,-9.0,-9.3,-3.1,-2.8,-7.7,-6.5,7.6,3.4,-11.6,7.6,-3.2,-9.1,-9.6,-11.6,-13.3,-10.7,4.0,8.0,7.0,-13.0,-14.0,5.0,-14.0,-13.7,-38.0,-29.8,5.6,2.5,-2.6,-38.2,9.2,8.8,7.5,9.8,6.4,7.3,-9.2,-9.5,-21.1,4.1,-1.9,-12.0,-1.4,1.1,9.8,-10.6,4.1,1.7,-4.6,-15.1,7.1,7.3,5.9,1.5,8.3,3.4,-22.1,8.3,1.1,-3.2,7.4,-0.5,9.6,-14.5,5.7,5.5,-2.8,2.1,-38.0,4.0,-29.8,1.8,4.1,-0.5,-30.0,-12.4,7.7,7.4,7.4,0.0,-3.3,5.5,0.0,-20.8,-5.2,5.0,5.0,2.3,6.7,-8.0,-5.3,-10.2,-19.4,-26.0,4.0,-10.1,-8.4,-11.8,-11.6,-11.1,-11.6,-9.1,-9.7,-10.8,-6.2,-6.3,-4.2,6.2,10.0,6.3,-6.2,-5.7,-7.7,-10.4,-23.1,-25.8,-15.1,-8.7,-8.1,-9.5,-6.3,1.2,5.5,-15.5,-13.4,-15.3,-11.2,-11.6,-11.6,-4.8,-4.1,-3.7,-6.8,-14.3,-18.8,-14.0,-8.0,-11.2]
# list of properties with coordinates
data = OrderedDict(:variable => z)
coord = [(lon,lat) for (lon,lat) in zip(x,y)]
# define spatial data
sdata = PointSetData(data, coord)
# the solution domain (basically a grid covering the entire globe)
dims = (180, 89)
start = (1.0, -89.0)
finish = (359.0, 89.0)
sdomain = RegularGrid(start, finish, dims=dims)
problem = EstimationProblem(sdata, sdomain, :variable)
# axes for plots
xs = range(start[1], stop=finish[1], length=dims[1])
ys = range(start[2], stop=finish[2], length=dims[2])
# Plotting solution with data on top
NT = (distance=Haversine(1.0),)
solver = InvDistWeight(:variable => NT)
solution = GeoStats.solve(problem, solver)
μ, σ = solution[:variable]
plt = contourf(xs, ys, permutedims(μ, (2,1)))
scatter!(plt, sdata)
juliohm commented 4 years ago

Inverse distance weighting usually produces these very flat maps with peaks on the data. Also, I think you should pass Earth radius to Haversine, no? A radius of 1 will compute incorrect distances on the globe.

briochemc commented 4 years ago

Oh I see. Yes I think I would prefer a Kriging-generated map a bit more.

Oh and yes I used 1.0 because I assumed that only the relative distance matters? The magnitude does not really matter since the weights are normalized anyway, right?

juliohm commented 4 years ago

Oh yeah, you are correct. The interpolated result will use normalized weights, so the actual distance value isn't important. We are also returning this "variance" map which actually uses the distances, but since you don't seem to be interested in it, shouldn't be a problem.

juliohm commented 4 years ago

You can also try the LocallyWeightedRegression.jl solver while the Kriging one is being investigated here with Haversine. We will likely need a similar PR to handle the Haversine as you did with InverseDistanceWeighting.jl because of the KD-tree.

Locally weighted regression should produce an improved version of the map, however it doesn't pass through the observation values (regression != interpolation).

briochemc commented 4 years ago

BTW could you help clarify for uneducated me how this all works? I can use the distance in both the named tuple fed to Kriging, or in the kwargs of the GaussianVariogram. (BTW Should use something else than GaussianVariogram anyway?) So I could do something like this, which does not make a ton of sense to me...

NT = (variogram=GaussianVariogram(distance=d1), distance=d2)
solver = Kriging(:prop => NT)

I would really like to get this working soon, but this is beyond my expertise...

juliohm commented 4 years ago

Of course! The distance in the Kriging solver is about identifying neighbors and search ellipsoids. When you set the maxneighbors or neighborhood options, Kriging becomes a local method with a moving neighborhood or with k-nearest neighbors. In both cases, one needs to retrieve neighbors with some distance. The distance in the Variogram types is mainly about anisotropy but in your case it is also a Haversine. The variogram is just the correlation function (correlation vs. distance) and so we need to be able to compute distances that are appropriate for the manifold (sphere in lat/lon). I recall that not all variogram models are valid models in the sphere (they do not lead to positive definite covariances), and that may also be the reason why we are getting numerical instabilities here. I didn't have much chance in the past to work on the sphere and so we need to learn together here.

Regarding the choice of the variogram model, the GaussianVariogram is used when you have a very smooth process (e.g. temperature maps), whereas most physical processes have a short-scale variability that resembles ExponentialVariogram or SphericalVariogram. Maybe you could try a SphericalVariogram instead as it is more stable numerically as well. One thing that you could also try for debugging purposes would be to flatten the sphere using a projection (CoordinateTransformations.jl or Proj4.jl) and then do Kriging there with a Euclidean distance. If the issue is gone, we have more evidence that this is related to the sphere.

briochemc commented 4 years ago

I just wanted to let you know I've had some success using DIVAnd.jl. (DIVAnd was built more for ocean data and I don't really know what I'm doing but it seems like it works for this problem.) Anyway, my point is that the code in DIVAnd may have some answers to our questions here, too?

juliohm commented 4 years ago

The DIVA method looks interesting. I didn't have a chance to read their paper carefully (it is on my TODO list), but from what I understood they are solving a very different system of equations compared to the Kriging system.

Tomorrow I will try to debug the LocallyWeightedRegression.jl solver a bit more before I come back to the issue we are facing here. I've revised some old papers today and am more confident that Haversine should work fine there.

juliohm commented 4 years ago

The LocallyWeightedRegression.jl solver is now working with Haversine:

solver = LocalWeightRegress(:z => (distance=Haversine(6371.),))

image

I've released a minor version as discussed there. Next step is to do a similar analysis with the variogram ranges here to see what is causing the numerical instability in the Kriging system.

juliohm commented 4 years ago

@briochemc I've finally had a chance to look into this. There are some issues with your example code like repeated coordinates and an poor variogram model for the data. Together these issues caused those strange contours. I am in the middle of a major release process, but soon I am done with the release I will try to prepare a tutorial to illustrate the steps needed for estimation on the sphere :)

yakir12 commented 2 years ago

Hi Júlio! I'm also interested in inter/extra-polating ungridded 3D data on a surface of a sphere. Did you ever get a chance to

prepare a tutorial to illustrate the steps needed for estimation on the sphere

...?

I managed to get very far (thanks to your tutorials and docs), but there are a few artefacts in the results, and I'm not certain why.

Here's my code:

using KrigingEstimators
using Variography
using GeoStats
using Random

# generate some random data on a sphere
Random.seed!(1234);
radius = 1
n = 25
ϕ = acosd.(1 .- 2rand(n)) .- 90 # latitude
λ = 360rand(n) .- 180 # longitude
v = 2rand(n) .- 1 # some value between -1 and 1

# kriging
𝒟 = georef((v=v,), collect(zip(λ, ϕ)))
ngrid = 100
# would have been better to have a `SphericalGrid` here...
𝒢 = CartesianGrid(GeoStats.Point(-180, -90), GeoStats.Point(180, 90), dims=(ngrid, ngrid)) 
problem = EstimationProblem(𝒟, 𝒢, :v)
# a range of 30 degrees seems fine, and a Haversine distance makes sense
solver = Kriging(:v => (variogram=GaussianVariogram(range=30), distance=Haversine(radius)))
solution = solve(problem, solver)

So far so good, I hope? But when I plot it, I see two main artefacts, both seem to be related to the fact that the data is not cartesian but spherical. Please forgive me, but I prefer using Makie over Plots, hope that's fine:

using GLMakie, CoordinateTransformations

ϕl = range(-90, 90, length = ngrid)
λl = range(-180, 180, length = ngrid)
vl = reshape(values(solution).v, ngrid, ngrid)
xyz = CartesianFromSpherical().(Spherical.(radius, deg2rad.(λl), deg2rad.(ϕl')))
x = getindex.(xyz, 1)
y = getindex.(xyz, 2)
z = getindex.(xyz, 3)
fig = Figure()
ax = Axis3(fig[1,1], aspect = :data)
surface!(ax, x, y, z, color = vl, shading  = false, colormap = :bluesreds)
xyz = CartesianFromSpherical().(Spherical.(1.01radius, deg2rad.(λ), deg2rad.(ϕ)))
x = getindex.(xyz, 1)
y = getindex.(xyz, 2)
z = getindex.(xyz, 3)
scatter!(ax, x, y, z, color=v, strokecolor = :black, strokewidth=1, colormap = :bluesreds)
Colorbar(fig[1,2], colormap = :bluesreds, height = Relative(0.6))
hidedecorations!(ax)
hidespines!(ax)

Screenshot 2022-07-31 at 00 33 22

The artefacts I noticed are:

  1. There is a "fault line" where the longitude starts and ends
  2. Data seems to be stretched/squeezed towards the poles
yakir12 commented 2 years ago

It might have to do with the fact that the distribution of angular distances between uniformly distributed points on a sphere is different: Screenshot 2022-07-31 at 10 46 05

yakir12 commented 2 years ago

I posted a very detailed explanation of this process on Discourse. I'm more than willing to turn that into a tutorial and PR it to https://github.com/JuliaEarth/GeoStatsTutorials

juliohm commented 1 year ago

Hi Yakir, I'm typing from my cell phone...

Take a look at the SPDEGS solver which generates simulations over manifolds:

https://www.linkedin.com/posts/j%C3%BAlio-hoffimann-834936116_geostatistics-geostats-geospatial-activity-6909824670523404289-1qRQ?utm_source=linkedin_share&utm_medium=android_app

I will try to reserve some time over the week to explain how you can use similar idea to interpolate over the sphere.

Em dom., 31 de jul. de 2022 05:48, Yakir Luc Gagnon < @.***> escreveu:

It might have to do with the fact that the distribution of angular distances between uniformly distributed points on a sphere is different: [image: Screenshot 2022-07-31 at 10 46 05] https://user-images.githubusercontent.com/3281957/182018323-b2ad6f67-6f47-4a01-8895-03357e945441.png

— Reply to this email directly, view it on GitHub https://github.com/JuliaEarth/GeoStats.jl/issues/62#issuecomment-1200380380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MBMBN5LXNT6BQ27PTVWY4XFANCNFSM4NRGHXFQ . You are receiving this because you modified the open/close state.Message ID: @.***>