GeoStat-Framework / PyKrige

Kriging Toolkit for Python
https://pykrige.readthedocs.io
BSD 3-Clause "New" or "Revised" License
740 stars 186 forks source link

Attempt at using pyKrige.ok3d with xarray.apply_ufunc to reduce memory issues #284

Closed MDTocean closed 5 months ago

MDTocean commented 6 months ago

Hiya

I am attempting to use OrdinaryKriging3D with a large set of temperature measurements. These are scattered measurements taken at (longitude,latitude,time) positions, resulting in four very long 1D vector strings. I would then like to interpolate this onto a regular 3D grid of (lon_reg,lat_reg,time_reg) using 3D kriging. This appears to require huge amounts of memory, requiring an array of size len(lon_reg)len(lat_reg)len(time_reg)*len(temperature).

A number of other posts have reported on memory issues. To try and get around the large memory requirements, I have been attempting to treat the data arrays using lazy operations and calling the pyKrige commands from within xarray's apply_ufunc functionality. However, I am still running into memory issues.

I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake, either in the data creation or in the apply_ufunc call, or whether the problem is more fundamental to the kriging operation itself -- for example, are kriging methods simply incompatible with the concept of dask chunks and lazy operations? The code works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).

Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.

Thank you for any advice.


import numpy as np
import xarray as xr
from pykrige.ok3d import OrdinaryKriging3D

# create random datasets
sample_length=2000
da_time=xr.DataArray(data=np.arange(0,sample_length),coords=dict(time=np.arange(0,sample_length))).chunk(chunks={"time" : 100})
da_lat=xr.DataArray(data=np.random.uniform(-12, 3, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_lon=xr.DataArray(data=np.random.uniform(45, 62, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_temp=xr.DataArray(data=np.random.rand(sample_length)+18,coords=dict(time=da_time)).chunk(chunks={"time" : 100})

# Define the function to apply to the dataset
def kriging_3d(da_lon,da_lat,da_time,da_temp):

    xi = np.linspace(-12, 3, 91)
    yi = np.linspace(45, 62, 103)
    zi = np.arange(np.min(da_time),np.max(da_time))

    # Create the 3D Kriging object
    OK3D = OrdinaryKriging3D(da_lon, da_lat, da_time, da_temp, variogram_model='linear')
    # Execute on grid
    out, ss = OK3D.execute('grid', xi, yi, zi)

    # convert the output into an xarray object
    out = xr.DataArray(out, coords=[("zi", zi), ("yi", yi), ("xi", xi)])

    return out

out = xr.apply_ufunc(
    kriging_3d,
    da_lon,da_lat,da_time,da_temp,
    input_core_dims=[["time"],["time"],["time"],["time"]],
    output_core_dims=[["zi", "yi", "xi"]],
    dask = 'allowed', 
    vectorize = True,
    )