Mean computed on training set, stdev computed on test set

UCL-SML / Doubly-Stochastic-DGP

Deep Gaussian Processes with Doubly Stochastic Variational Inference

Apache License 2.0

144 stars 48 forks source link

Mean computed on training set, stdev computed on test set #23

Open maximilianmordig opened 6 years ago

maximilianmordig commented 6 years ago

Is there a reason you compute the mean on the training set and the standard deviation on the test set? (lines 75 and 76)

https://github.com/ICL-SML/Doubly-Stochastic-DGP/blob/bac0d6617c706117f02ac6eaac056b5b63974ce0/demos/datasets.py#L74-L80

hughsalimbeni commented 6 years ago

Yes, that was for consistency with other papers (in particular https://arxiv.org/abs/1502.05336), though in more recently I've been normalizing before splitting instead (see https://github.com/hughsalimbeni/bayesian_benchmarks). The idea of normalizing after splitting is that it doesn't make the task any easier, whereas pre-split normalization reduces the degrees of freedom in the data so makes the task (very slightly) easier.