Open RomanFoell opened 5 years ago
This is because the inner layers have more than 1GP each. In the paper the timing comparison for increased layers was (perhaps misleadingly) for 1D inputs, so the scaling is just L where L is in the number of layers. If the inner layers all have D outputs then the scaling with layers is 1 + D(L-1)
I should add, there's a nice way around this problem that will be coming soon. Watch this space!
Thanks for your answer! So is it right, that when I want a DGP2 or DGP3 with 1dim layers size, I have to change it to:
for L in range(1, 4):
D = X.shape[1]
# the layer shapes are defined by the kernel dims, so here all hidden layers are D dimensional
kernels = []
for l in range(L):
if l==0:
kernels.append(RBF(D))
else:
kernels.append(RBF(1))
Thanks.
Yes, that model should scale more nicely (though note that it might have quite different properties from the model with the input dimension at the inner layers, as the identity mean function won't be being used. This model looks more like the bayesian warped gp)
Hello,
I tried your code demo_regression_UCI successfully for several data-sets, but when I want to calculate a model for DGP1, DGP2, DGP3, the time for training DGP2, DGP3 compared to DGP1, is not as expected 2 or 3times, but much larger. Here are the times
DGP1 8.3926920890808 DGP2 407.8747105598449 DGP3 809.6086103916168
Is there something I am doing wrong? I also recognize a huge amount of cpu usage compared to DGP1. Thanks for your answer.