Question about Normalizing X dependent on #channels, and Related Issue with Mean Functions

jimmyrisk commented 3 years ago

Hi,

Sorry for the massive amounts of requests, but your package is exactly what I need for my current projects, so I'm just trying to make sure everything is compatible. I really appreciate the work you've put into this and the prompt responses.

I tried to modify the code for a mean function in your example here to my problem, but the mean functions are not performing as expected.

I tried to come up with a reproducible example which seems to show the issue, where normalizing is occurring and I'm not sure when, where, or why.

Here is the example:

# generate data
n_points = 100
x = np.linspace(0.0, 6.0, n_points)

f = np.sin(x*4.0*np.pi) + 2*x - 0.2*x**2 + 0.1*np.random.normal(size=len(x))
g = np.sin(1/(x+0.1))*x + 0.2*x - 0.01*(x-3)**3 + 0.2*f + 0.025*np.random.normal(size=len(x))
data = mogptk.DataSet(
    mogptk.Data(x, f, name="f"),
    mogptk.Data(x, g, name="g")
)

# declare model
model = mogptk.MOSM(data, Q=2)

Now, when I look at the X tensor associated with this model, the data is normalized:

model.model.X
tensor([[   0.0000,    0.0000],
        [   0.0000,   10.1010],
        [   0.0000,   20.2020],
        [   0.0000,   30.3030],
        [   0.0000,   40.4040],
        [   0.0000,   50.5051],
...
        [   0.0000,  959.5960],
        [   0.0000,  969.6970],
        [   0.0000,  979.7980],
        [   0.0000,  989.8990],
        [   0.0000, 1000.0000],
        [   1.0000,    0.0000],
        [   1.0000,   10.1010],
        [   1.0000,   20.2020],
        [   1.0000,   30.3030],
        [   1.0000,   40.4040],
        [   1.0000,   50.5051],
        [   1.0000,   60.6061],
        [   1.0000,   70.7071],
        [   1.0000,   80.8081],
        [   1.0000,   90.9091],
        [   1.0000,  101.0101],
        [   1.0000,  111.1111],
        [   1.0000,  121.2121],
        [   1.0000,  131.3131],
        [   1.0000,  141.4141],
...
        [   1.0000,  959.5960],
        [   1.0000,  969.6970],
        [   1.0000,  979.7980],
        [   1.0000,  989.8990],
        [   1.0000, 1000.0000]], device='cuda:0', dtype=torch.float64)

where I believe the first column is the channel, and the second is the x variable, which was initialized with x = np.linspace(0.0, 6.0, n_points).

In contrast, if the data only has one channel, the behavior is much different:

n_points = 100
x = np.linspace(0.0, 6.0, n_points)
f = np.sin(x*4.0*np.pi) + 2*x - 0.2*x**2 + 0.1*np.random.normal(size=len(x))

data = mogptk.Data(x, f, name="f")

kernel = mogptk.gpr.MaternKernel(input_dims=1)
mo_kernel = mogptk.gpr.IndependentMultiOutputKernel(kernel)
model = mogptk.Model(data, mo_kernel)

Now,

model.model.X
tensor([[0.0000, 0.0000],
        [0.0000, 0.0606],
        [0.0000, 0.1212],
        [0.0000, 0.1818],
        [0.0000, 0.2424],
 ...
        [0.0000, 5.6970],
        [0.0000, 5.7576],
        [0.0000, 5.8182],
        [0.0000, 5.8788],
        [0.0000, 5.9394],
        [0.0000, 6.0000]], device='cuda:0', dtype=torch.float64)

The data isn't normalized.

This seems to pose a problem when declaring a mean function, since following the example, we are told to define something like


class Mean(mogptk.gpr.Mean):
    def __init__(self):
        super(Mean, self).__init__()
        self.coefficients = mogptk.gpr.Parameter([0.0, 0.0, 0.0])

    def __call__(self, X):
        coefs = self.coefficients()
        return coefs[0] + coefs[1] * X[:, 1] + coefs[2] * X[:, 1] ** 2

As a result, I get absurd predictions when there are multiple channels, due to the mean function not seeming to agree on what is normalized and what isn't. It works perfectly fine when there is one channel, which leads me to believe that the normalizing that occurs with multiple channels is at play.

I suppose I don't actually have a concrete question, but I hope you can see the issue and fix it. Basically, normalizing seems to occur under the hood when there are multiple channels, but not when there are single channels, and it causes issues in prediction with multiple channels.

For reference, I'm attaching the picture for the multiple-channel prediction.

Also, this is occurring in my own project where I tried to implement a mean function.

Lastly, if there are multiple inputs, could you verify for a user defined mean function, we want to use one of X[:, 1], X[:, 2], ..., X[:, p] to refer to the respective column of our input data? This seems to work (noting that X[:, 0] refers to the channel), but I'd like to make sure.

Thanks so much!

tdewolff commented 3 years ago

Hi Jimmy, no problem at all! We're glad for the feedback! To respond to your questions:

As a result, I get absurd predictions when there are multiple channels, due to the mean function not seeming to agree on what is normalized and what isn't. It works perfectly fine when there is one channel, which leads me to believe that the normalizing that occurs with multiple channels is at play.

The difference here is that MOSM rescales the X input to [0,1000] in order to better fit the data (see the culprit here: https://github.com/GAMES-UChile/mogptk/blob/master/mogptk/models/mosm.py#L45). Otherwise we've noted poor performance due to numerical issues, which are common in GPs. The other model on the other hand is setup "manually", as in it's not using a predefined model in mogptk.models, but refers directly to mogptk.gpr. You could do the same for the MOSM case, by defining the kernel as follows:

spectral = mogptk.gpr.MultiOutputSpectralKernel(output_dims=dataset.get_output_dims(), input_dims=1)
kernel = mogptk.gpr.MixtureKernel(spectral, Q=1)
model = mogptk.Model(dataset, kernel)

Lastly, if there are multiple inputs, could you verify for a user defined mean function, we want to use one of X[:, 1], X[:, 2], ..., X[:, p] to refer to the respective column of our input data? This seems to work (noting that X[:, 0] refers to the channel), but I'd like to make sure.

Yes correct! The first column is the channel ID and the other columns are for each of the input dimensions.

Let me know if you have any other questions ;-)

jimmyrisk commented 3 years ago

Great! This should fix all of my problems.

jimmyrisk commented 3 years ago

Actually, I'm wondering how we would define a mean function if we want to use the built in mosm or similar. I would prefer to normalize my inputs for the numerical issues (I have had similar problems...)

In that case, how can I define the mean function?

tdewolff commented 3 years ago

You could use dataset.rescale_x() before defining your models, this will set the X axis values equally across all models. Then your mean function would need to work on the range of [0,1000] instead of [0,6]. The only thing the rescaling of the X axis does is a linear transformation of the values on X.

jimmyrisk commented 3 years ago

dataset.rescale_x() doesn't appear to fix it (unless I'm mixing up the terms data and dataset...). Here is a working example, with the plots:

import mogptk
import numpy as np

n_points = 100
x = np.linspace(0.0, 6.0, n_points)

f = np.sin(x*4.0*np.pi) + 2*x - 0.2*x**2 + 0.1*np.random.normal(size=len(x))
g = np.sin(1/(x+0.1))*x + 0.2*x - 0.01*(x-3)**3 + 0.2*f + 0.025*np.random.normal(size=len(x))
data = mogptk.DataSet(
    mogptk.Data(x, f, name="f"),
    mogptk.Data(x, g, name="g")
)
data.plot(); plt.show()

data.rescale_x()

data[0].remove_range(start=5.5)

class Mean(mogptk.gpr.Mean):
    def __init__(self):
        super(Mean, self).__init__()
        self.coefficients = mogptk.gpr.Parameter([0.0, 0.0, 0.0])

    def __call__(self, X):
        coefs = self.coefficients()
        return coefs[0] + coefs[1] * X[:, 1] + coefs[2] * X[:, 1] ** 2

mean = Mean()

model = mogptk.MOSM(data, Q=2, mean=mean)

mean.trainable = True
kernel.trainable = True
model.train(method='Adam', lr=0.1, iters=100, plot=True, error='MAE');

model.predict()
model.plot_prediction(); plt.show()

The data:

The predictions:

jimmyrisk commented 3 years ago

Furthermore, the fitted coefficients of the quadratic are

[ 0.00068303  0.00012869 -0.00015598]

And for a test point x=6, the fitted mean function returns

0.000683 + 0.000129*6 - 0.00015598*6^2 = -0.00415828

while, for x=1000,

0.000683 + 0.000129*1000 - 0.00015598*1000**2 = -155.85, which is approximately the value you can see in the top plot.

So, there seems to be some issue in it trying to use x=1000 (standardized) in the mean function, when it's looking at x=6 in the actual data for plotting purposes (and I imagine for error purposes as well)

tdewolff commented 3 years ago

The issue may not be a bug, but difficulty in training. I've been trying to figure out what's going wrong, and I have the following observations:

Without rescale_x (commenting out the line in MOSM) the ideal parameters are approximately [0.0, 1.24, -0.13]. With rescale_x I would think that the parameters would be scaled by (6/1000)^n, such that we obtain [0.0, 1.24*6/1000, -0.13*(6/1000)^2] = [0.0, 6.84e-3, -4.32e-6]. The latter is exactly what I obtain, but I needed to enable init_parameters(method='LS') and use 300 iterations. Key here is to set the coefficients initially to the result that we expect, otherwise it seems pretty hard to find the parameter set (e.g. the third parameter should be negative but training may find it to be positive, which is a local minima in which training gets stuck).

So far I don't think there is a bug in the code and this is a training/initialization problem, though we could make rescale_x optional for the models. If you think there is a bug, please let us know.

tdewolff commented 3 years ago

You can now use mogptk.MOSM(..., rescale_x=False) to disable automatic X axis rescaling!

tdewolff commented 2 years ago

rescale_x is disabled by default now. Closing this issue, please open a new issue if problems persist.

GAMES-UChile / mogptk

Question about Normalizing X dependent on #channels, and Related Issue with Mean Functions #28