SMTorg / smt

Surrogate Modeling Toolbox
https://smt.readthedocs.io/en/stable
BSD 3-Clause "New" or "Revised" License
685 stars 206 forks source link

What is the mathematical principle of finding the function(def _predict_derivatives(self, x, kx) in krg_based.py ) of the gradient #312

Closed ZhihaoLiu-git closed 3 years ago

ZhihaoLiu-git commented 3 years ago

What''s the mathetic principle of _predict_derivatives(~) ? Could please give me the link of the function? thanks!

ZhihaoLiu-git commented 3 years ago
``def _predict_derivatives(self, x, kx):
        """
        Evaluates the derivatives at a set of points.

        Parameters
        ---------
        x : np.ndarray [n_evals, dim]
            Evaluation point input variable values
        kx : int
            The 0-based index of the input variable with respect to which derivatives are desired.

        Returns
        -------
        y : np.ndarray
            Derivative values.
        """
        # Initialization
        n_eval, n_features_x = x.shape
        if self.options["corr"] == "gower":
            r = np.exp(
                -gower_matrix(
                    x, data_y=self.X_train, weight=np.asarray(self.optimal_theta)
                )
            )
        else:
            x = (x - self.X_offset) / self.X_scale
            # Get pairwise componentwise L1-distances to the input training set
            dx = differences(x, Y=self.X_norma.copy())
            d = self._componentwise_distance(dx)
            # Compute the correlation function
            r = self._correlation_types[self.options["corr"]](
                self.optimal_theta, d
            ).reshape(n_eval, self.nt)

        if self.options["corr"] != "squar_exp":
            raise ValueError(
                "The derivative is only available for squared exponential kernel"
            )
        if self.options["poly"] == "constant":
            df = np.zeros((1, self.nx))
        elif self.options["poly"] == "linear":
            df = np.zeros((self.nx + 1, self.nx))
            df[1:, :] = np.eye(self.nx)
        else:
            raise ValueError(
                "The derivative is only available for ordinary kriging or "
                + "universal kriging using a linear trend"
            )

        # Beta and gamma = R^-1(y-FBeta)
        beta = self.optimal_par["beta"]
        gamma = self.optimal_par["gamma"]
        df_dx = np.dot(df.T, beta)
        d_dx = x[:, kx].reshape((n_eval, 1)) - self.X_norma[:, kx].reshape((1, self.nt))
        if self.name != "Kriging" and "KPLSK" not in self.name:
            theta = np.sum(self.optimal_theta * self.coeff_pls ** 2, axis=1)
        else:
            theta = self.optimal_theta
        y = (
            (df_dx[kx] - 2 * theta[kx] * np.dot(d_dx * r, gamma))
            * self.y_std
            / self.X_scale[kx]
        `)`
        return y
Paul-Saves commented 3 years ago
def _predict_derivatives(self, x: np.ndarray, kx: int) -> np.ndarray:
    Implemented by surrogate models to predict the dy_dx derivatives (optional).

The formula for the square exponential kernel is here: https://smt.readthedocs.io/en/latest/_src_docs/surrogate_models/krg.html?highlight=kriging

The formula are given from kriging (see 1: Sacks, J. and Schiller, S. B. and Welch, W. J., Designs for computer experiments, Technometrics 31 (1) (1989) 41–47. ) , when expressing the mean prediction, we have a formula that depend on the covariance kernel. The formula is basically the sum between a function (prior) and a term that take into account the variability of the data. Something like $\hat{y}= \hat{\mu}+r'R^{-1}(y-\mathds{1} \hat{\mu}) $.

To obtain the derivative, we just derive this. (df/dx) for f, Beta and gamma = R^-1(y-FBeta), the derivative of x^2 is 2.x.d/dx,... (And, as we are scaling the data, it should be taken into account)

The mathematical principal is the derivative of kriging. For square polynomial, for example, the derivative formula will be different because it will came from the QP interpolation model.

ZhihaoLiu-git commented 3 years ago

thanks for your prompt reply! In short, your reply means the derivatives comes from definitive formula. In the SMT documentation, there are 3 kinds of derivatives, the derivate of kriging is d(y_train)/d(x_train), which is used in GE-KPLS. how about the other 2 derivatives. I mean which surrogate models use dy/dyt and dy/dx?

Paul-Saves commented 3 years ago

Several models support spatial derivatives (for mean prediction and/or variances) For example, in https://smt.readthedocs.io/en/latest/_src_docs/surrogate_models/kpls.html, there is dydx = sm.predict_derivatives(xt, 0).

It depends on the application for what you need a surrogate, one can posibly need the derivative of the model. It's not necessarily used to build it.

For example for QP:

        supports["derivatives"] = True

By default the surrogate_models options are:

        supports["training_derivatives"] = False
        supports["derivatives"] = False
        supports["output_derivatives"] = False
        supports["variance_derivatives"] = False

For the function you were talking about, it predict the dy_dx derivatives at a set of points. You can check https://github.com/SMTorg/smt/blob/master/smt/surrogate_models/surrogate_model.py

ZhihaoLiu-git commented 3 years ago

because I didn't study mechanics:

  1. In the GE-KPLS, and any other GE-surrgate models, the derivatives comes from a definitive objective function. If we already know the objective function, why not optimize it's mathematically. All the surrogate problems I saw are black box funciton that we don't know the objective funcion.
  2. If we don't know the objective function's mathematical expression , we can't obtain gradient information. So all the GE-surrogate models are unusable. Is it right? 3.for example in the smt-mastersmtexamplesb777_engine/b777_engine_derivs.dat, How do we get the derivatives data?

thanks for your reply!!

relf commented 3 years ago

GE-KPLS method use derivatives to get a better surrogate, so you definitely need to get derivatives in the first place to use GE-KPLS.

You can get derivatives through numerical methods without exact analytical expression. So if you have a computation-intensive code which gives you values and derivatives at training locations you might be able to replace it with a GE-KPLS surrogate model which will be cheaper to evaluate.

See GE-KPLS reference in the SMT paper.

ZhihaoLiu-git commented 3 years ago

thank you very much!