cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.51k stars 552 forks source link

The implementation of SM kernel for multidimensional input does not seem to be correct #1198

Open Xiaodonghai opened 4 years ago

Xiaodonghai commented 4 years ago

🐛 Bug

exp_term = (x1exp - x2exp).pow(2).mul(-2 * math.pi * 2) cos_term = (x1cos - x2cos).mul_(2 math.pi) res = expterm.exp() * costerm.cos()

For multi-dimensional inputs, according to the reference [1], it seems that _costerm should sum on dimensions first and then perform cosine transformation. [1] Andrew Gordon Wilson. 'Correction to Spectral Mixture (SM) Kernel Derivation for Multidimensional Inputs'. May 15, 2015. Available at http://www.cs.cmu.edu/~andrewgw/typo.pdf

Thanks for your kind attention and look forward your prompt reply.

KeAWang commented 4 years ago

The current code does seem to follow the original paper and not the correction:

        # Compute the exponential and cosine terms
        exp_term = (x1_exp_ - x2_exp_).pow_(2).mul_(-2 * math.pi ** 2)
        cos_term = (x1_cos_ - x2_cos_).mul_(2 * math.pi)
        res = exp_term.exp_() * cos_term.cos_()

        # Product over dimensions
        if last_dim_is_batch:
            res = res.squeeze(-1)
        else:
            res = res.prod(-1)

        # Sum over mixtures
        mixture_weights = self.mixture_weights

@jacobrgardner @gpleiss

Xiaodonghai commented 4 years ago

The current code does seem to follow the original paper and not the correction:

        # Compute the exponential and cosine terms
        exp_term = (x1_exp_ - x2_exp_).pow_(2).mul_(-2 * math.pi ** 2)
        cos_term = (x1_cos_ - x2_cos_).mul_(2 * math.pi)
        res = exp_term.exp_() * cos_term.cos_()

        # Product over dimensions
        if last_dim_is_batch:
            res = res.squeeze(-1)
        else:
            res = res.prod(-1)

        # Sum over mixtures
        mixture_weights = self.mixture_weights

@jacobrgardner @gpleiss

Thank you for your reply.

In addition, I feel that the forward function seems to have some problems, but I'm not sure. Firstly, we can obtain self.mixture_scales = varz, which is the covariances of GMM.


>      from sklearn.mixture import GaussianMixture

        GMM = GaussianMixture(n_components=self.num_mixtures, covariance_type="diag").fit(inv_spec)
        means = GMM.means_
        varz = GMM.covariances_
        weights = GMM.weights_

        self.mixture_means = means
        self.mixture_scales = varz
        self.mixture_weights = weights
> ```
Then self.mixture_scales = varz is used in forward function as follows:
```python
>      # Compute distances - scaled by appropriate parameters
        x1_exp = x1_ * self.mixture_scales
        x2_exp = x2_ * self.mixture_scales
        x1_cos = x1_ * self.mixture_means
        x2_cos = x2_ * self.mixture_means

        # Create grids
        x1_exp_, x2_exp_ = self._create_input_grid(x1_exp, x2_exp, last_dim_is_batch=last_dim_is_batch, **params)
        x1_cos_, x2_cos_ = self._create_input_grid(x1_cos, x2_cos, last_dim_is_batch=last_dim_is_batch, **params)

        # Compute the exponential and cosine terms
        exp_term = (x1_exp_ - x2_exp_).pow_(2).mul_(-2 * math.pi ** 2)
        cos_term = (x1_cos_ - x2_cos_).mul_(2 * math.pi)
        res = exp_term.exp_() * cos_term.cos_()
> ```
I doubt whether  
```python
>      # Compute distances - scaled by appropriate parameters
        x1_exp = x1_ * self.mixture_scales
        x2_exp = x2_ * self.mixture_scales
> ```
should be 
```python
>      # Compute distances - scaled by appropriate parameters
        x1_exp = x1_ * np.sqrt(self.mixture_scales)
        x2_exp = x2_ * np.sqrt(self.mixture_scales)
> ```
, because their differences (x1_exp - x2_exp) will be squared.

Thanks for your kind attention and look forward your prompt reply.
andrewgordonwilson commented 4 years ago

I think in many cases it is more convenient and similarly effective to use a product of 1D spectral mixture kernels for a multi-D spectral mixture (as in the "SMP" kernel formulation of https://www.cs.cmu.edu/~andrewgw/manet.pdf), which I believe is what is implemented here, over using the result of inverse Fourier transforming a multidimensional mixture of Gaussians (which is what is presented in the typo correction note).

Xiaodonghai commented 4 years ago

I think in many cases it is more convenient and similarly effective to use a product of 1D spectral mixture kernels for a multi-D spectral mixture (as in the "SMP" kernel formulation of https://www.cs.cmu.edu/~andrewgw/manet.pdf), which I believe is what is implemented here, over using the result of inverse Fourier transforming a multidimensional mixture of Gaussians (which is what is presented in the typo correction note).

I sincerely thank you for your reply and have benefited a lot. Another question here is whether

 x1_exp = x1_ * self.mixture_scales
 x2_exp = x2_ * self.mixture_scales

should be

 x1_exp = x1_ * np.sqrt(self.mixture_scales)
 x2_exp = x2_ * np.sqrt(self.mixture_scales)

in the forward() function, because self.mixture_scales = varz = GMM.covariances_ and pow_(2) is carried out as follows:

exp_term = (x1_exp_ - x2_exp_).pow_(2).mul_(-2 * math.pi ** 2)

Looking forward to your reply. Many thanks!