time camparison of number of GPs in a DGP much larger than expected

UCL-SML / Doubly-Stochastic-DGP

Deep Gaussian Processes with Doubly Stochastic Variational Inference

Apache License 2.0

144 stars 48 forks source link

time camparison of number of GPs in a DGP much larger than expected #29

Open RomanFoell opened 5 years ago

RomanFoell commented 5 years ago

Hello,

I tried your code demo_regression_UCI successfully for several data-sets, but when I want to calculate a model for DGP1, DGP2, DGP3, the time for training DGP2, DGP3 compared to DGP1, is not as expected 2 or 3times, but much larger. Here are the times

DGP1 8.3926920890808 DGP2 407.8747105598449 DGP3 809.6086103916168

Is there something I am doing wrong? I also recognize a huge amount of cpu usage compared to DGP1. Thanks for your answer.

hughsalimbeni commented 5 years ago

This is because the inner layers have more than 1GP each. In the paper the timing comparison for increased layers was (perhaps misleadingly) for 1D inputs, so the scaling is just L where L is in the number of layers. If the inner layers all have D outputs then the scaling with layers is 1 + D(L-1)

hughsalimbeni commented 5 years ago

I should add, there's a nice way around this problem that will be coming soon. Watch this space!

RomanFoell commented 5 years ago

Thanks for your answer! So is it right, that when I want a DGP2 or DGP3 with 1dim layers size, I have to change it to:

for L in range(1, 4):
        D = X.shape[1]
        # the layer shapes are defined by the kernel dims, so here all hidden layers are D dimensional 
        kernels = []
        for l in range(L):
            if l==0:
                kernels.append(RBF(D))
            else:
                kernels.append(RBF(1))

Thanks.

hughsalimbeni commented 5 years ago

Yes, that model should scale more nicely (though note that it might have quite different properties from the model with the input dimension at the inner layers, as the identity mean function won't be being used. This model looks more like the bayesian warped gp)