Closed banesullivan closed 3 years ago
Without having had a more in depth look at your problem yet and if the length scale bound is the problem, you can do following for example:
model = Gaussian(dim=3, var=10)
model.set_arg_bounds(len_scale=[0, 10000])
model.len_scale = 2000
If you pull the newest version of GSTools, the boundaries are now much generous ;-)
Okay, awesome! I pulled the latest changes and increasing the length scale is helping me approach a better solution.
Do you all have any examples, tips, or tricks for finding a good Gaussian model?
Hello,
let's see if I can make a suggestion (please correct me if I got anything wrong):
The len_scale
of GSTools corresponds to the range describing the spatial correlation of data when performing variogram analysis. There is basically two ways to obtain a range for your model.
1) Expert knowledge: You have some idea of the length scale of spatial correlation in your data set that you use as a range. In this case using the same range (len_scale
) that you used for the SGeMS model should give you the same result. Just be aware that some software packages use the "effective range" (for the Gaussian model this is sqrt(3)*range I think).
2) You perform a variogram analysis (basically a data driven approach to obtain a range). If I am correct GSTools offers this: https://geostat-framework.readthedocs.io/projects/gstools/en/latest/tutorial_03_vario.html
Apart from that you might have anisotropies in your model (in your example it looks like you have strong continuity in x and y direction (horizontal) and variation in z direction corresponding to the direction of sedimentation. Also in literature (e.g. Webster and Oliver (2008)) the Gaussian model is deprecated for approaching the origing with zero gradient. You might want to consider a different variogram model.
Maybe this is helpful and please correct me if I got anything wrong.
Hi @banesullivan,
did the reply from @javoha (thanks for that!) help? - Could you reproduce the "truth" from SGeMS?
Thanks for the tips @javoha!
I haven't had the time to jump back into this... I will need to next week-ish and I will report back!
You might want to consider a different variogram model.
@javoha, Are there any other options for models when kriging with GSTools?
@LSchueler, feel free to close this issue as it's not really an issue/bug with the software but a support topic
@banesullivan : you can use all provided models for kriging: provided-covariance-models And of course self defined ones. ;-)
Closing now. But thanks for the input ;-)
So I'm trying this again but with another dataset that is a bit cleaner. Data attached and this data has a permissive license if you want to use it in an example in your docs - just let me know.
I've compiled with OpenMP support per https://github.com/GeoStat-Framework/GSTools/issues/31#issuecomment-535519002 but as soon as I start running the kriging (either on a structured or unstructured mesh) the memory consumption blows up with over 100 GB used on the SWAP so I kill the task. This amount of memory usage seems ridiculous to me...
@MuellerSeb, @LSchueler, or @javoha: Can you all help in some capacity to make this example run?
Or can you put some 3D examples in the docs besides just a random field? I can't get anything to work in 3D with this library yet.
Data files: Archive.zip
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
from gstools import (Exponential, Gaussian, krige,
vario_estimate_unstructured, Stable)
grid = pv.read("grid.vtr")
assay = pv.read("assay.vtp")
cond_pos = assay.points.T # gstools convention
# conditional values - what i want to interpolate
field = assay["CU_pct"]
model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(40)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=40)
ax.plot(bin_center, gamma)
print(fit_model)
Not too shabby of a variogram, so I should be able to at least produce some results with this....
# Create the kriging model
krig = krige.Ordinary(model,
cond_pos=cond_pos,
cond_val=field)
And this is where the memory usage blows up
krige_field, krige_var = krig((grid.x, grid.y, grid.z), mesh_type="structured")
kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var
Or as unstructured:
krige_field, krige_var = krig(grid.points.T, mesh_type="unstructured")
kgrid["CU_pct"] = krige_field
kgrid["krige_var"] = krige_var
And...
model = Exponential(dim=3, var=2, len_scale=10)
# estimate the variogram of the field with 40 bins
bins = np.arange(100)
bin_center, gamma = vario_estimate_unstructured(cond_pos, field, bins)
# fit the variogram with a stable model. (no nugget fitted)
fit_model = Stable(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
# output
ax = fit_model.plot(x_max=100)
ax.plot(bin_center, gamma)
print(fit_model)
Good heavens! I think the problem is here: https://github.com/GeoStat-Framework/GSTools/blob/5326163d567a6ac0121ca75c007c2ae7d46564b7/gstools/krige/ordinary.py#L140
This function creates a matrix containing all pairwise distances between the conditional points and the given field-points. In your case a matrix with 1000000 * 8583 entries... quite huge.
We need to put this into chunks I guess. Therefore we need to introduce a controlling parameter like chunk_size
which is maybe None by default, or something like 1e6.
ATM the output mesh has just too many points. Maybe as a work around you could chunk it manually, by cutting the x,y,z positions. Sorry for this inconvenience.
Maybe try something like this:
krige_field = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)
krige_var = np.empty((len(grid.x), len(grid.y), len(grid.z)), dtype=float)
for i in range(5):
for j in range(5):
for k in range(5):
(
krige_field[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20],
krige_var[i*20:(i+1)*20, j*20:(j+1)*20, k*20:(k+1)*20],
) = krig(
(
grid.x[i*20:(i+1)*20],
grid.y[j*20:(j+1)*20],
grid.z[k*20:(k+1)*20],
),
mesh_type="structured",
)
huh, okay this makes sense now why the memory usage was blowing up and makes me feel a bit better knowing it wasn't just a user issue.
With that suggested code, it runs without massive memory consumption but is taking ages to execute... (no result as of yet)
Can you all implement that behind the scenes? And possibly parallelize it or find another way to speed this up?
Yeah that's indeed not a user issue, it's an inherent problem with Python. In order to speed things up, Numpy and Scipy often rely on underlying C-code, which is compiled to machine code and thus really fast. But the problem is, these C-functions have to be fed by the Python code with all the data at once. That's the reason why MuellerSeb's work around helps with the memory consumption.
We'll discuss some options.
And thank you very much for providing the data!
@banesullivan : I started to re-implement the kriging routines. You can have a look at the feature branch: https://github.com/GeoStat-Framework/GSTools/tree/krige_update
There you now have the opportunity to define a chunksize when calling the kriging routine. The routines have also been optimized, so the kriging system is only determined once on creation.
Maybe you can play around with that and give a feedback ;-)
Cheers, Sebastian
Ah awesome, @MuellerSeb! I will have to try that out when I have a chance... been a bit focused on a specific project lately.
FYI, I managed to get pretty good results with the original data from the top post (see https://github.com/GeoStat-Framework/GSTools/issues/36#issue-510963617) with the following Exponential variogram:
model = Exponential(dim=3, # 3D
var=9e3, # Sill
len_scale=[5000, 5000, 850]
)
I think it would be pretty cool to include this example in the docs somewhere... also it would be cool to switch the examples over to sphinx-gallery but in order to do that with PyVista, you would have to leave readthedocs.org for github pages
Also, since this is the original topic of this issue, I cannot get a decent variogram fit for that dataset. The following doesn't seem to work:
_, bins = np.histogram(field, 50,)
bin_center, gamma = vario_estimate_unstructured(
cond_pos,
field,
bins,
)
fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
from gstools.covmodel.plot import plot_variogram
plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()
any tips?
@banesullivan : That kriging result looks good indeed! Could you provide a full working script to reproduce the temperature image? We are aiming at a better structure for the examples folder, so this could be nice for the collection.
Addressing the second problem: Why are you using the bins from the histogram? These bins are setup in order to to proper bin the field values. But the bins in the variogram are needed for the pairwise distances of the field points. I think you are mixing things up there.
We are working on a new variogram estimation routine at the moment, where bining could be done automatically. So this should be more convenient in the future.
Aha! I totally misinterpreted the purpose of the bins. Much better now:
Here is the data: probes.vtp.zip
Here is the full script:
import numpy as np
from gstools import Exponential, krige, vario_estimate_unstructured
import matplotlib.pyplot as plt
import pyvista as pv
# Load the data
probes = pv.read("probes.vtp")
cond_pos = probes.points.T # gstools convention
# conditional values - what i want to interpolate
field = probes['temperature (C)']
bins = np.linspace(0, 12300, 1000)
bin_center, gamma = vario_estimate_unstructured(
cond_pos,
field,
bins,
)
fit_model = Exponential(dim=3)
fit_model.fit_variogram(bin_center, gamma, nugget=False)
from gstools.covmodel.plot import plot_variogram
plt.plot(bin_center, gamma)
plot_variogram(fit_model, x_max=bins[-1], ax=plt.gca())
plt.show()
# Create the kriging grid
grid = pv.UniformGrid()
grid.origin = (329700, 4252600, -2700)
grid.spacing = (250, 250, 50)
grid.dimensions = (60, 75, 100)
# Create the kriging model
krig = krige.Ordinary(fit_model, cond_pos=cond_pos, cond_val=field)
r = grid.cast_to_rectilinear_grid() # Use structed grid for better performance
krige_field, krige_var = krig((r.x, r.y, r.z), mesh_type="structured")
grid["temperature (C)"] = krige_field.ravel(order="f")
grid["krige_var"] = krige_var.ravel(order="f")
dargs = dict(cmap="coolwarm", clim=[0,300], scalars="temperature (C)")
p = pv.Plotter()
p.add_volume(grid, opacity="sigmoid_8", **dargs)
p.add_mesh(probes, **dargs)
p.show()
There's a lot of other data for this project with some isosurfaces of "true" temperature values. Here's a GIF with other stuff I didn't include here.
@banesullivan : Cool! Thanks for the scripts.. stunning!
@banesullivan : Automatic Binning was implemented with #131 and PyVista is now support with #59.
So I'm trying to get up and running with this software, and I'm finding it rather difficult to produce a dirty ordinary kriging result. I learned geostats/kriging using SGeMS and I'm realizing much of what I learned is difficult to transfer here (so far)... I'm curious if someone could help me get some quick/dirty results with a dataset that I've used in SGeMS before.
The Example
I believe my Gaussian model is probably problematic
I think I am running into trouble with the
len_scale
option in the Gaussian model. GSTools enforces a limit of 1000 on that but for this dataset, I may need a higherlen_scale
?If you all were to use this given dataset to produce an example showcasing what GSTools can do, what would that be? This is all in an effort to help me build out how PyVista might interface with GSTools
Data
I have sparse, subsurface temperature probe data and want to krige a volume of these data.
data.zip
"Truth"
I'm working towards results that look more like this (prodcued in SGeMS):