GeoStat-Framework / GSTools

GSTools - A geostatistical toolbox: random fields, variogram estimation, covariance models, kriging and much more
https://geostat-framework.org
GNU Lesser General Public License v3.0
567 stars 74 forks source link

Ordinary Kriging: help please? #36

Closed banesullivan closed 3 years ago

banesullivan commented 5 years ago

So I'm trying to get up and running with this software, and I'm finding it rather difficult to produce a dirty ordinary kriging result. I learned geostats/kriging using SGeMS and I'm realizing much of what I learned is difficult to transfer here (so far)... I'm curious if someone could help me get some quick/dirty results with a dataset that I've used in SGeMS before.

The Example

I believe my Gaussian model is probably problematic

import numpy as np
from gstools import Exponential, Gaussian, krige
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")
model_space = pv.read("model_space.vtr")

model = Gaussian(dim=3, var=10, len_scale=500)
cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']
# Create the kriging model
krig = krige.Ordinary(model, cond_pos=cond_pos, cond_val=field)

krige_field, krige_var = krig((model_space.points.T))

model_space["krige_field"] = krige_field
model_space["krige_var"] = krige_var

model_space.contour().plot(scalars="krige_field" )

download

I think I am running into trouble with the len_scale option in the Gaussian model. GSTools enforces a limit of 1000 on that but for this dataset, I may need a higher len_scale?

If you all were to use this given dataset to produce an example showcasing what GSTools can do, what would that be? This is all in an effort to help me build out how PyVista might interface with GSTools

Data

I have sparse, subsurface temperature probe data and want to krige a volume of these data.

data.zip

"Truth"

I'm working towards results that look more like this (prodcued in SGeMS):

Screen Shot 2019-10-22 at 4 29 33 PM
LSchueler commented 5 years ago

Without having had a more in depth look at your problem yet and if the length scale bound is the problem, you can do following for example:

model = Gaussian(dim=3, var=10)
model.set_arg_bounds(len_scale=[0, 10000])
model.len_scale = 2000
LSchueler commented 5 years ago

If you pull the newest version of GSTools, the boundaries are now much generous ;-)

banesullivan commented 5 years ago

Okay, awesome! I pulled the latest changes and increasing the length scale is helping me approach a better solution.

Do you all have any examples, tips, or tricks for finding a good Gaussian model?

javoha commented 5 years ago

Hello,

let's see if I can make a suggestion (please correct me if I got anything wrong):

The len_scale of GSTools corresponds to the range describing the spatial correlation of data when performing variogram analysis. There is basically two ways to obtain a range for your model.

1) Expert knowledge: You have some idea of the length scale of spatial correlation in your data set that you use as a range. In this case using the same range (len_scale) that you used for the SGeMS model should give you the same result. Just be aware that some software packages use the "effective range" (for the Gaussian model this is sqrt(3)*range I think).

2) You perform a variogram analysis (basically a data driven approach to obtain a range). If I am correct GSTools offers this: https://geostat-framework.readthedocs.io/projects/gstools/en/latest/tutorial_03_vario.html

Apart from that you might have anisotropies in your model (in your example it looks like you have strong continuity in x and y direction (horizontal) and variation in z direction corresponding to the direction of sedimentation. Also in literature (e.g. Webster and Oliver (2008)) the Gaussian model is deprecated for approaching the origing with zero gradient. You might want to consider a different variogram model.

Maybe this is helpful and please correct me if I got anything wrong.

LSchueler commented 4 years ago

Hi @banesullivan,

did the reply from @javoha (thanks for that!) help? - Could you reproduce the "truth" from SGeMS?

banesullivan commented 4 years ago

Thanks for the tips @javoha!

I haven't had the time to jump back into this... I will need to next week-ish and I will report back!

You might want to consider a different variogram model.

@javoha, Are there any other options for models when kriging with GSTools?

@LSchueler, feel free to close this issue as it's not really an issue/bug with the software but a support topic

MuellerSeb commented 4 years ago

@banesullivan : you can use all provided models for kriging: provided-covariance-models And of course self defined ones. ;-)

MuellerSeb commented 4 years ago

Closing now. But thanks for the input ;-)

banesullivan commented 4 years ago

So I'm trying this again but with another dataset that is a bit cleaner. Data attached and this data has a permissive license if you want to use it in an example in your docs - just let me know.

I've compiled with OpenMP support per https://github.com/GeoStat-Framework/GSTools/issues/31#issuecomment-535519002 but as soon as I start running the kriging (either on a structured or unstructured mesh) the memory consumption blows up with over 100 GB used on the SWAP so I kill the task. This amount of memory usage seems ridiculous to me...

@MuellerSeb, @LSchueler, or @javoha: Can you all help in some capacity to make this example run?

Or can you put some 3D examples in the docs besides just a random field? I can't get anything to work in 3D with this library yet.

Data files: Archive.zip

import numpy as np
import matplotlib.pyplot as plt

import pyvista as pv

from gstools import (Exponential, Gaussian, krige, 
                     vario_estimate_unstructured, Stable)

grid = pv.read("grid.vtr")
assay = pv.read("assay.vtp")

cond_pos = assay.points.T # gstools convention
# conditional values - what i want to interpolate
field = assay["CU_pct"]

model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(40)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=40)
ax.plot(bin_center, gamma)
print(fit_model)

download

Not too shabby of a variogram, so I should be able to at least produce some results with this....

# Create the kriging model
krig = krige.Ordinary(model, 
                      cond_pos=cond_pos, 
                      cond_val=field)

And this is where the memory usage blows up

krige_field, krige_var = krig((grid.x, grid.y, grid.z), mesh_type="structured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var
banesullivan commented 4 years ago

Or as unstructured:


krige_field, krige_var = krig(grid.points.T, mesh_type="unstructured")

kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var
banesullivan commented 4 years ago

And...

model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(100)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=100)
ax.plot(bin_center, gamma)
print(fit_model)

download

MuellerSeb commented 4 years ago

Good heavens! I think the problem is here: https://github.com/GeoStat-Framework/GSTools/blob/5326163d567a6ac0121ca75c007c2ae7d46564b7/gstools/krige/ordinary.py#L140

This function creates a matrix containing all pairwise distances between the conditional points and the given field-points. In your case a matrix with 1000000 * 8583 entries... quite huge.

We need to put this into chunks I guess. Therefore we need to introduce a controlling parameter like chunk_size which is maybe None by default, or something like 1e6.

ATM the output mesh has just too many points. Maybe as a work around you could chunk it manually, by cutting the x,y,z positions. Sorry for this inconvenience.

MuellerSeb commented 4 years ago

Maybe try something like this:

krige_field = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)
krige_var = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)

for i in range(5):
    for j in range(5):
        for k in range(5):
            (
                krige_field[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20], 
                krige_var[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20],
            ) = krig(
                (
                    grid.x[i*20:(i+1)*20],
                    grid.y[j*20:(j+1)*20], 
                    grid.z[k*20:(k+1)*20],
                ),
                mesh_type="structured",
            )
banesullivan commented 4 years ago

huh, okay this makes sense now why the memory usage was blowing up and makes me feel a bit better knowing it wasn't just a user issue.

With that suggested code, it runs without massive memory consumption but is taking ages to execute... (no result as of yet)

Can you all implement that behind the scenes? And possibly parallelize it or find another way to speed this up?

LSchueler commented 4 years ago

Yeah that's indeed not a user issue, it's an inherent problem with Python. In order to speed things up, Numpy and Scipy often rely on underlying C-code, which is compiled to machine code and thus really fast. But the problem is, these C-functions have to be fed by the Python code with all the data at once. That's the reason why MuellerSeb's work around helps with the memory consumption.

We'll discuss some options.

And thank you very much for providing the data!

MuellerSeb commented 4 years ago

@banesullivan : I started to re-implement the kriging routines. You can have a look at the feature branch: https://github.com/GeoStat-Framework/GSTools/tree/krige_update

There you now have the opportunity to define a chunksize when calling the kriging routine. The routines have also been optimized, so the kriging system is only determined once on creation.

Maybe you can play around with that and give a feedback ;-)

Cheers, Sebastian

banesullivan commented 4 years ago

Ah awesome, @MuellerSeb! I will have to try that out when I have a chance... been a bit focused on a specific project lately.

FYI, I managed to get pretty good results with the original data from the top post (see https://github.com/GeoStat-Framework/GSTools/issues/36#issue-510963617) with the following Exponential variogram:

model = Exponential(dim=3,   # 3D
                    var=9e3, # Sill
                    len_scale=[5000, 5000, 850]
                   )
Screen Shot 2020-01-09 at 10 25 18 PM

I think it would be pretty cool to include this example in the docs somewhere... also it would be cool to switch the examples over to sphinx-gallery but in order to do that with PyVista, you would have to leave readthedocs.org for github pages


Also, since this is the original topic of this issue, I cannot get a decent variogram fit for that dataset. The following doesn't seem to work:

_, bins = np.histogram(field, 50,)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )

fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

download

any tips?

MuellerSeb commented 4 years ago

@banesullivan : That kriging result looks good indeed! Could you provide a full working script to reproduce the temperature image? We are aiming at a better structure for the examples folder, so this could be nice for the collection.

Addressing the second problem: Why are you using the bins from the histogram? These bins are setup in order to to proper bin the field values. But the bins in the variogram are needed for the pairwise distances of the field points. I think you are mixing things up there.

We are working on a new variogram estimation routine at the moment, where bining could be done automatically. So this should be more convenient in the future.

banesullivan commented 4 years ago

Aha! I totally misinterpreted the purpose of the bins. Much better now:

Here is the data: probes.vtp.zip

Here is the full script:

import numpy as np
from gstools import Exponential, krige, vario_estimate_unstructured
import matplotlib.pyplot as plt
import pyvista as pv

# Load the data
probes = pv.read("probes.vtp")

cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']

bins = np.linspace(0, 12300, 1000)
bin_center, gamma = vario_estimate_unstructured(
    cond_pos,
    field,
    bins,
    )

fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)

from gstools.covmodel.plot import plot_variogram

plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()

download

# Create the kriging grid
grid = pv.UniformGrid()
grid.origin = (329700, 4252600, -2700)
grid.spacing = (250, 250, 50)
grid.dimensions = (60, 75, 100)

# Create the kriging model
krig = krige.Ordinary(fit_model, cond_pos=cond_pos, cond_val=field)

r = grid.cast_to_rectilinear_grid() # Use structed grid for better performance
krige_field, krige_var = krig((r.x, r.y, r.z), mesh_type="structured")

grid["temperature (C)"] = krige_field.ravel(order="f")
grid["krige_var"] = krige_var.ravel(order="f")

dargs = dict(cmap="coolwarm", clim=[0,300], scalars="temperature (C)")

p = pv.Plotter()
p.add_volume(grid, opacity="sigmoid_8", **dargs)
p.add_mesh(probes, **dargs)
p.show()

download


There's a lot of other data for this project with some isosurfaces of "true" temperature values. Here's a GIF with other stuff I didn't include here.

2020-01-10 12 54 03

MuellerSeb commented 4 years ago

@banesullivan : Cool! Thanks for the scripts.. stunning!

55 gives a proposal for automated bining. maybe you can try this and compare the results.

MuellerSeb commented 3 years ago

@banesullivan : Automatic Binning was implemented with #131 and PyVista is now support with #59.