GeoStat-Framework / PyKrige

Kriging Toolkit for Python
https://pykrige.readthedocs.io
BSD 3-Clause "New" or "Revised" License
766 stars 189 forks source link

Is it possible to use GridSearchCV with the class OrdinaryKriging? #128

Closed Irene-GM closed 4 years ago

Irene-GM commented 5 years ago

Hi,

I am interested in using pykrige.ok.OrdinaryKriging and pykrige.uk.UniversalKriging in many different set ups with respect to the accepted parameters. Thus, it makes sense to use GridSearchCV to define a big dictionary with all the parameters and let the library do the rest. However, it seems that GridSearchCV is coupled to the RegressionKriging package and not accepting parameters from OK/UK. Thus, if I use the code below specifying some of the OK parameters (e.g. variogram_parameters, anisotropy_scaling), the following error is raised:

ValueError: Invalid parameter anisotropy_angle for estimator Krige(method='ordinary', n_closest_points=10, nlags=6, variogram_model='linear',
      verbose=False, weight=False). Check the list of available parameters with `estimator.get_params().keys()`.

Find below the code used. I wonder whether there is a way of using GridSearchCV also with the OK/UK parameters that I might have overlooked, or else the solution is to generate in a loop the models with the parameters required for this experiment.

Thanks for the great work! :-)


import numpy as np
from pykrige.rk import Krige
from pykrige.compat import GridSearchCV

methods = ["ordinary"]
vario_models = ["linear", "exponential", "gaussian", "spherical", "power"]
weights = [True, False]
ani_scalings = [1]
ani_angles = [0]
verb = [True]
enable_plot = [False]
enable_stat = [False]
coordi_type = ["euclidean"]

param_dict = {  "method":methods,
                "variogram_model":vario_models,
                "weight":weights,
                "anisotropy_scaling":ani_scalings,
                "anisotropy_angle":ani_angles,
                "verbose":verb,
                "enable_plotting":enable_plot,
                "enable_statistics":enable_stat,
                "coordinates_type":coordi_type
                }

estimator = GridSearchCV(Krige(), param_dict, verbose=True)

print("List of params: ")
print(estimator.get_params().keys())

# dummy data
X3 = np.random.randint(0, 400, size=(100, 2)).astype(float)
y = 5 * np.random.rand(100)

# run the gridsearch
estimator.fit(X=X3, y=y)

print(estimator.cv_results_)
mariosgeo commented 5 years ago

Following cause I have the same issue.

jtilson commented 4 years ago

I am stuck by this as well.

jtilson commented 4 years ago

For my own information, do issues like this get resolved by some volunteer taking on the task? This turns out to be highly important, for those doing science and using this code.

MuellerSeb commented 4 years ago

Hey there. I'll have a look at it. Since we are going to restructure PyKrige with version 2, we can add it to the list.

MuellerSeb commented 4 years ago

For the moment you could rewrite the Krige Estimator used by GridSearchCV, to support additional keywords. I did this with the following hack in your script:

import numpy as np
from pykrige.compat import GridSearchCV
from pykrige.ok import OrdinaryKriging
from pykrige.uk import UniversalKriging
from pykrige.ok3d import OrdinaryKriging3D
from pykrige.uk3d import UniversalKriging3D
from sklearn.base import RegressorMixin, BaseEstimator

krige_methods = {
    "ordinary": OrdinaryKriging,
    "universal": UniversalKriging,
    "ordinary3d": OrdinaryKriging3D,
    "universal3d": UniversalKriging3D,
}
threed_krige = ("ordinary3d", "universal3d")
# valid additional keywords for each method
krige_methods_kws = {
    "ordinary": [
        "anisotropy_scaling",
        "anisotropy_angle",
        "enable_statistics",
        "coordinates_type",
    ],
    "universal": [
        "anisotropy_scaling",
        "anisotropy_angle",
        "drift_terms",
        "point_drift",
        "external_drift",
        "external_drift_x",
        "external_drift_y",
        "functional_drift",
    ],
    "ordinary3d": [
        "anisotropy_scaling_y",
        "anisotropy_scaling_z",
        "anisotropy_angle_x",
        "anisotropy_angle_y",
        "anisotropy_angle_z",
    ],
    "universal3d": [
        "anisotropy_scaling_y",
        "anisotropy_scaling_z",
        "anisotropy_angle_x",
        "anisotropy_angle_y",
        "anisotropy_angle_z",
        "drift_terms",
        "functional_drift",
    ],
}

def validate_method(method):
    if method not in krige_methods.keys():
        raise ValueError(
            "Kriging method must be one of {}".format(krige_methods.keys())
        )

class Krige(RegressorMixin, BaseEstimator):
    def __init__(
        self,
        method="ordinary",
        variogram_model="linear",
        variogram_parameters=None,
        variogram_function=None,
        nlags=6,
        weight=False,
        n_closest_points=10,
        verbose=False,
        anisotropy_scaling=1.0,
        anisotropy_angle=0.0,
        enable_statistics=False,
        coordinates_type="euclidean",
        anisotropy_scaling_y=1.0,
        anisotropy_scaling_z=1.0,
        anisotropy_angle_x=0.0,
        anisotropy_angle_y=0.0,
        anisotropy_angle_z=0.0,
        drift_terms=None,
        point_drift=None,
        external_drift=None,
        external_drift_x=None,
        external_drift_y=None,
        functional_drift=None,
    ):
        validate_method(method)
        self.variogram_model = variogram_model
        self.variogram_parameters = variogram_parameters
        self.variogram_function = variogram_function
        self.nlags = nlags
        self.weight = weight
        self.verbose = verbose
        self.anisotropy_scaling = anisotropy_scaling
        self.anisotropy_angle = anisotropy_angle
        self.enable_statistics = enable_statistics
        self.coordinates_type = coordinates_type
        self.anisotropy_scaling_y = anisotropy_scaling_y
        self.anisotropy_scaling_z = anisotropy_scaling_z
        self.anisotropy_angle_x = anisotropy_angle_x
        self.anisotropy_angle_y = anisotropy_angle_y
        self.anisotropy_angle_z = anisotropy_angle_z
        self.drift_terms = drift_terms
        self.point_drift = point_drift
        self.external_drift = external_drift
        self.external_drift_x = external_drift_x
        self.external_drift_y = external_drift_y
        self.functional_drift = functional_drift
        self.model = None  # not trained
        self.n_closest_points = n_closest_points
        self.method = method
        self.val_kw = "val" if self.method in threed_krige else "z"

    def fit(self, x, y, *args, **kwargs):
        setup = dict(
            variogram_model=self.variogram_model,
            variogram_parameters=self.variogram_parameters,
            variogram_function=self.variogram_function,
            nlags=self.nlags,
            weight=self.weight,
            verbose=self.verbose,
        )
        add_setup = dict(
            anisotropy_scaling=self.anisotropy_scaling,
            anisotropy_angle=self.anisotropy_angle,
            enable_statistics=self.enable_statistics,
            coordinates_type=self.coordinates_type,
            anisotropy_scaling_y=self.anisotropy_scaling_y,
            anisotropy_scaling_z=self.anisotropy_scaling_z,
            anisotropy_angle_x=self.anisotropy_angle_x,
            anisotropy_angle_y=self.anisotropy_angle_y,
            anisotropy_angle_z=self.anisotropy_angle_z,
            drift_terms=self.drift_terms,
            point_drift=self.point_drift,
            external_drift=self.external_drift,
            external_drift_x=self.external_drift_x,
            external_drift_y=self.external_drift_y,
            functional_drift=self.functional_drift,
        )
        for kw in krige_methods_kws[self.method]:
            setup[kw] = add_setup[kw]
        input_kw = self._dimensionality_check(x)
        input_kw.update(setup)
        input_kw[self.val_kw] = y
        self.model = krige_methods[self.method](**input_kw)

    def _dimensionality_check(self, x, ext=""):
        if self.method in ("ordinary", "universal"):
            if x.shape[1] != 2:
                raise ValueError("2d krige can use only 2d points")
            else:
                return {"x" + ext: x[:, 0], "y" + ext: x[:, 1]}
        if self.method in ("ordinary3d", "universal3d"):
            if x.shape[1] != 3:
                raise ValueError("3d krige can use only 3d points")
            else:
                return {
                    "x" + ext: x[:, 0],
                    "y" + ext: x[:, 1],
                    "z" + ext: x[:, 2],
                }

    def predict(self, x, *args, **kwargs):
        if not self.model:
            raise Exception("Not trained. Train first")
        points = self._dimensionality_check(x, ext="points")
        return self.execute(points, *args, **kwargs)[0]

    def execute(self, points, *args, **kwargs):
        points.update(dict(style="points", backend="loop"))
        if isinstance(self.model, (OrdinaryKriging, OrdinaryKriging3D)):
            points.update(dict(n_closest_points=self.n_closest_points))
        else:
            print("n_closest_points will be ignored for UniversalKriging")
        prediction, variance = self.model.execute(**points)
        return prediction, variance

param_dict = {
    "method": ["ordinary"],
    "anisotropy_scaling": [1],
    "anisotropy_angle": [0],
    "enable_statistics": [False],
    "coordinates_type": ["euclidean"],
    "variogram_model": ["linear", "exponential", "gaussian", "spherical"],
    "weight": [True, False],
    "verbose": [True],
}

estimator = GridSearchCV(Krige(), param_dict, verbose=True)
# dummy data
X3 = np.random.randint(0, 400, size=(100, 2)).astype(float)
y = 5 * np.random.rand(100)
# run the gridsearch
estimator.fit(X=X3, y=y)
print(estimator.cv_results_)

Does this help? @Irene-GM @mariosgeo @jtilson @rth: is this hack worth implementing in a 1.5.1 version? I think it's really just a hack.

mariosgeo commented 4 years ago

@MuellerSeb yes it helps. Thanks for updating the script, I think it's worthwhile adding it to version 1.5.1 as well.

jtilson commented 4 years ago

Thank you for looking into this.

jtilson commented 4 years ago

HI Sabastian, Thank you (all) for your hard work. Question, the below script doesn't account for an important aspect of mine. That is optimizing the vparams (specifically nugget, sill and range). Would you expect to be able to simply add those keys to the script or will they still require special treatment?

Regards, jeff


Jeffrey Tilson

Senior Research Scientist

Renaissance Computing Institute

UNC Chapel Hill

(919) 445-9657


From: mariosgeo notifications@github.com Sent: Monday, April 6, 2020 3:31 AM To: GeoStat-Framework/PyKrige PyKrige@noreply.github.com Cc: Tilson, Jeffrey L jtilson@renci.org; Mention mention@noreply.github.com Subject: Re: [GeoStat-Framework/PyKrige] Is it possible to use GridSearchCV with the class OrdinaryKriging? (#128)

@MuellerSebhttps://github.com/MuellerSeb yes it helps. Thanks for updating the script, I think it's worthwhile adding it to version 1.5.1 as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoStat-Framework/PyKrige/issues/128#issuecomment-609617638, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6PWAGIFKVD3MOYQB2L25LRLGAOZANCNFSM4ID7CTPA.

MuellerSeb commented 4 years ago

HI Sabastian, Thank you (all) for your hard work. Question, the below script doesn't account for an important aspect of mine. That is optimizing the vparams (specifically nugget, sill and range). Would you expect to be able to simply add those keys to the script or will they still require special treatment? Regards, jeff

The variogram_parameters can be accessed in the Class given above, if I am not getting it wrong. They are None by default. Only problem is, you have to stick to one model, or at least a set of models, that have the same list of model parameters (linear has no length-scale for example).

MuellerSeb commented 4 years ago

@jtilson : I was a bit wrong. You can also pass a dict containing the variogram parameters by name and this can include a "range" keyword which is ignored in the case of a "linear" model. So you don't have to care about the models and you can provide the parameters as a dictionary (and these could come from a list to use the GridSearch).

jtilson commented 4 years ago

Thank you for the information Sebastian, jeff


Jeffrey Tilson

Senior Research Scientist

Renaissance Computing Institute

UNC Chapel Hill

(919) 445-9657


From: Sebastian Müller notifications@github.com Sent: Monday, August 3, 2020 10:17 AM To: GeoStat-Framework/PyKrige PyKrige@noreply.github.com Cc: Tilson, Jeffrey L jtilson@renci.org; Mention mention@noreply.github.com Subject: Re: [GeoStat-Framework/PyKrige] Is it possible to use GridSearchCV with the class OrdinaryKriging? (#128)

@jtilsonhttps://github.com/jtilson : I was a bit wrong. You can also pass a dict containing the variogram parameters by name and this can include a "range" keyword which is ignored in the case of a "linear" model. So you don't have to care about the models and you can provide the parameters as a dictionary (and these could come from a list to use the GridSearch).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoStat-Framework/PyKrige/issues/128#issuecomment-668047910, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6PWAAP5QYNWGOL3VPYRTTR63BG3ANCNFSM4ID7CTPA.