GeoStat-Framework / PyKrige

Kriging Toolkit for Python
https://pykrige.readthedocs.io
BSD 3-Clause "New" or "Revised" License
774 stars 190 forks source link

Geographical models #149

Open MuellerSeb opened 4 years ago

MuellerSeb commented 4 years ago

It was shown, that the current approach of plugging in the great-circle distance directly into the covariance/variogram function does not automatically result in a valid covariance model on the sphere.

To overcome this flaw, we should use the so called Yadrenko Covariance Function (See here). Starting with a valid isotropic model in 3D, where the covariance between two points is given as a function of their distance:

cov(x1, x2) = func(dist(x1, x2))

We can construct the related yadrenko model on the sphere, that takes the great-circle distance zeta(x1, x2) with the following modification:

cov_sph(x1, x2) = func(2 * sin(zeta(x1, x2) / 2))

One should not, that 2 * sin(x / 2) is approximately x for small angles, which means, that the current approach in PyKrige is approximately correct for small angles.

mjziebarth commented 4 years ago

That is an interesting paper (and line of research) that I had not been aware of when I first implemented the great-circle distance. Do you happen to know how conclusive the research is overall with regards to that models in Euclidean space do not perform worse than models using the great circle distance?

Then I have one suggestion: From what I understand, the Yadrenko model lifts the great circle distance back to Euclidean space and then uses those Eucliden distances (please correct me if I'm wrong). If the choice is made to use the Yadrenko model (seeing that paper it seems reasonable to me), I think it could make sense to skip the use of the great circle distance entirely by directly converting the geographic coordinates to Euclidean space even before the distances are computed. This would reduce code complexity by handling the coordinates very early on and might speed things up a bit, I guess. Maybe it could make sense compare the accuracy of both approaches.

Best, Malte

MuellerSeb commented 4 years ago

Hey Malte, Thanks for your thoughts on that. I don't think that it lifts back the distance to 3D, since it directly depends on the great circle distance. I also found some other papers about that and the keypoint is to create valid covaraince models, meaning "conditional negative semidefinite" ones. I would suggest implementing a great-circle cdist, analog to scipys cdist to speed things up when using geo-coordinates, to easily create the distance matrix.

mjziebarth commented 4 years ago

Hey Sebastian,

with the comment I was referring mostly to one sentence of the paper that you linked (p. 3):

The Euclidean distance between two points on a sphere, which is also known as the chordal distance, can be expressed in terms of great circle distance as [...]

Directly after this, there is an equation that equates the Euclidean or chordal distance to the expression 2*sin(theta/2) where theta is the great circle distance, i.e. the expression you suggested to use for plugging great circle distance into the Euclidean covariance models. So, as far as I understand, this expression actually converts great circle distance back to Euclidean distance and relies on the great circle distance only superficially in the two-step conversion process. Did I miss something there?

Anyway, that was just to note that things may possibly be simplified from a conceptional point of view. Whether this will have an impact on performance remains speculation on my part.

That being said, I think the point about the invalid covariance models is a great one and admittedly one that I have no experience with. Best!

MuellerSeb commented 4 years ago

Ahhh! Thanks for the enlightenment! This could change a lot of stuff. I need to think about that.

MuellerSeb commented 4 years ago

I would suggest using the Yadrenko models as the standard interpretation for geo-coordinates. Then we can drop the whole haversine procedures and just calculate the fields in 3D (with a little memory overhead for the third positional dimension).

This way we can ensure, that the used model is really a conditionally negative semi-definite (CNSD, abbreviation used by Webster 2007) one. As mentioned above, this will result in approximately equal results for "small" angles. For example, for an great-circle distance of pi/2 (quarter of the globe) we have a deviation of 15% in the distance argument described above and for pi/4 it is only 2%.

This also means, that if the length scale parameter is less then pi/4 (~5000km on the globe) we can assume similar results for both methods (current approach and yadrenko model).

What do you think @rth, @mjziebarth, @bsmurphy, @LSchueler?

mjziebarth commented 4 years ago

Hi Sebastian,

to me this sounds good! I have two minor comments:

Best, Malte

MuellerSeb commented 4 years ago

@mjziebarth : I would suggest doing this within the GS-Framework 2.0 project. By increasing the major version of Pykrige and GSTools we can break backward compatibility. We can still keep v1.x of PyKrige alive, but I wouldn't introduce a deprecation cycle there. Of course, we have to well-document this change and the behavior.

MuellerSeb commented 3 years ago

This is now solved in GSTools and will be used in PyKrige in the future:

MuellerSeb commented 3 years ago

This issue should also mention, that we need to implement geographic coordinates for universal kriging.

MuellerSeb commented 2 years ago

217 is related.