GeoStat-Framework / PyKrige

Kriging Toolkit for Python
https://pykrige.readthedocs.io
BSD 3-Clause "New" or "Revised" License
740 stars 186 forks source link

Excessive memory use creating OrdinaryKriging object (even if not estimating variogram) #264

Open fiftysevendegreesofrad opened 1 year ago

fiftysevendegreesofrad commented 1 year ago

Hi, this looks to be a great library, however I wonder if I'm using it wrong?

I would like to krige based on a large number of data points. I am not estimating the variogram, and when it comes to execute() I plan to restrict to n_closest_points. The code below fails however with a memory error, as it seems to be trying to compute a distance matrix for my input points - I'm not sure why this is necessary if the variogram parameters are provided already?

print(data.shape) # outputs (600000,3)

OK = pykrige.ok.OrdinaryKriging(
    data[:, 0],
    data[:, 1],
    data[:, 2],
    variogram_model="gaussian",
    variogram_parameters = {"sill":fit_model.sill,"range":fit_model.len_scale,"nugget":fit_model.nugget}
)

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-31-86755f124e9f> in <module>
     15 
     16 #for some reason still tries to fit and runs out of memory
---> 17 OK = pykrige.ok.OrdinaryKriging(
     18     data[:, 0],
     19     data[:, 1],

~\Anaconda2\envs\bayesiandrape\lib\site-packages\pykrige\ok.py in __init__(self, x, y, z, variogram_model, variogram_parameters, variogram_function, nlags, weight, anisotropy_scaling, anisotropy_angle, verbose, enable_plotting, enable_statistics, coordinates_type, exact_values, pseudo_inv, pseudo_inv_type)
    319             self.semivariance,
    320             self.variogram_model_parameters,
--> 321         ) = _initialize_variogram_model(
    322             np.vstack((self.X_ADJUSTED, self.Y_ADJUSTED)).T,
    323             self.Z,

~\Anaconda2\envs\bayesiandrape\lib\site-packages\pykrige\core.py in _initialize_variogram_model(X, y, variogram_model, variogram_model_parameters, variogram_function, nlags, weight, coordinates_type)
    457     # to calculate semivariances...
    458     if coordinates_type == "euclidean":
--> 459         d = pdist(X, metric="euclidean")
    460         g = 0.5 * pdist(y[:, None], metric="sqeuclidean")
    461 

~\Anaconda2\envs\bayesiandrape\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, out, **kwargs)
   2231         if metric_info is not None:
   2232             pdist_fn = metric_info.pdist_func
-> 2233             return pdist_fn(X, out=out, **kwargs)
   2234         elif mstr.startswith("test_"):
   2235             metric_info = _TEST_METRICS.get(mstr, None)

MemoryError: Unable to allocate 1.31 TiB for an array with shape (179999700000,) and data type float64
MuellerSeb commented 8 months ago

That is indeed strange. And I am wondering why this issue was never raised before. If the parameters are given, the empirical variogram shouldn't be calculated.

Thanks for pointing this out. This is a very old bug (~9 years: https://github.com/GeoStat-Framework/PyKrige/blame/v1.3.1/pykrige/core.py#L186)