Accurate meaning of the correlation length?[DOC]

josselin-aval commented 1 year ago

Hello,

I understand that the correlation length (in a particular dimension) is the distance (in that dimension) over which an observation influences its neighborhood.

In the documentation, there is the https://gher-uliege.github.io/DIVAnd.jl/latest/index.html#DIVAnd.DIVAnd_kernel function with the following comment:

K(r) is the kernel function (function of the normalized distance r), len_scale is the distance at which K(len_scale) = 0.6019072301972346 (which is besselk(1,1))

I have the following questions:

Is there a connection between len_scale and the correlation length?
Considering that the kernel value is 1 at a particular observation, what is the value of the kernel at the correlation length?
Knowing this value, for instance 0.5, and considering a particular case where there is only 1 observation equal to X, with a background equal to 0, does it mean that the analysis will give a value of 0.5 * X at the correlation length if epsilon2 is close to 0?
I understand that a Bessel kernel is used, am I correct?

Maybe such information, especially the value of the kernel at the correlation length, would be interesting in the documentation in order to have an accurate idea of the magnitude of the influence of the observation at the correlation length.

Thank you in advance for your help.

ctroupin commented 1 year ago

hello and thanks for this relevant set of questions. I think an easy example to understand how is the kernel, one can simply run a 2D analysis like this:

using DIVAnd
using PyPlot

# observations
x = [50.];
y = [50.];
f = [10.]

# final grid
xi,yi = ndgrid(0:100,0:100);

# all points are valid points
mask = trues(size(xi));

pm = ones(size(xi)) / (xi[2,1]-xi[1,1]);
pn = ones(size(xi)) / (yi[1,2]-yi[1,1]);

# correlation length
len = 10.;

# obs. error variance normalized by the background error variance
epsilon2 = 1.;

# fi is the interpolated field
@time fi,s = DIVAndrun(mask,(pm,pn),(xi,yi),(x,y),f,len,epsilon2;alphabc=2);
pcolor(xi,yi,fi);
colorbar(pcm)
title("Correlation length: $(len)\nMaximal value: $(round(maximum(fi), digits=2))")

which will give the following results. Untitled

josselin-aval commented 1 year ago

Thank you very much for your quick answer @ctroupin .

This simulation example indeed helps understanding how is the kernel.

I just modified the value of the observation (f = [1.]) and set epsilon2 to a very small value in order to get a "normalized value" and perfectly fit the observation, respectively.

Then, focusing on a 1D profile of this simulation and varying the correlation length, we obtain the following graph:

The simulation is near 0.6 at the correlation length. It seems consistent with:

K(r) is the kernel function (function of the normalized distance r), len_scale is the distance at which K(len_scale) = 0.6019072301972346 (which is besselk(1,1))

Therefore, len_scale would be equivalent to the correlation length, and the value of the kernel at the correlation length would be 0.6019072301972346 (which is besselk(1,1)), am I correct?

ctroupin commented 1 year ago

yes that seems correct to me. So yes with a low \epsilon you ensure the noise-to-signal ratio is high.

What we use to do with the previous version of DIVA (purely 2D) was to show the solution of the same configuration (i.e. a single data point), but with a boundary (wall), so you see the "deformation" of the Kernel due to the wall.

From the code I wrote above, you can simulate the wall by setting some of the mask values to false. I'll prepare that later.

josselin-aval commented 1 year ago

Thanks for your quick reply, this is clear for me.

What we use to do with the previous version of DIVA (purely 2D) was to show the solution of the same configuration (i.e. a single data point), but with a boundary (wall), so you see the "deformation" of the Kernel due to the wall.

From the code I wrote above, you can simulate the wall by setting some of the mask values to false. I'll prepare that later.

Ok, interesting. I think we can close this issue after this test.

ctroupin commented 1 year ago

The code could be like this:

using DIVAnd
using PyPlot

# observations
x = [50.];
y = [50.];
f = [10.]

# final grid
xi,yi = ndgrid(0:100,0:100);

# all points are valid points
mask = trues(size(xi));

pm = ones(size(xi)) / (xi[2,1]-xi[1,1]);
pn = ones(size(xi)) / (yi[1,2]-yi[1,1]);

# correlation length
len = 10.;

# obs. error variance normalized by the background error variance
epsilon2 = 0.001;

# fi is the interpolated field
fi1,s = DIVAndrun(mask,(pm,pn),(xi,yi),(x,y),f,len,epsilon2;alphabc=2);

# interpolation with a "wall"
mask[1:49,:] .= false
fi2,s = DIVAndrun(mask,(pm,pn),(xi,yi),(x,y),f,len,epsilon2;alphabc=2);

and for the plot

plt.plot(0:100, fi1[:,51])
plt.plot(0:100, fi2[:,51])
plt.vlines(49, 0, 11)

josselin-aval commented 1 year ago

Following your example, we obtain such graph which indeed highlights the "wall" (blue: single point, red: single point with masking - "wall"):

Thanks again for your help!

ctroupin commented 1 year ago

I've added a Pluto notebook (https://plutojl.org/docs/install/) https://github.com/gher-uliege/DIVAnd.jl/blob/master/examples/DIVAnd_kernel_pluto.jl where you can play with sliders etc to change the parameter values and see the effect on the analysis, interactively.

josselin-aval commented 1 year ago

Thanks for completing the documentation.

I'm going to test this and Pluto, which I don't know.

gher-uliege / DIVAnd.jl

Accurate meaning of the correlation length?[DOC] #135