EmuKit / emukit

A Python-based toolbox of various methods in decision making, uncertainty quantification and statistical emulation: multi-fidelity, experimental design, Bayesian optimisation, Bayesian quadrature, etc.
https://emukit.github.io/emukit/
Apache License 2.0
605 stars 128 forks source link

multi_fidelity_dgp #224

Closed Hebbalali closed 5 years ago

Hebbalali commented 5 years ago

Hi all!

I had some questions while testing the benchmarking on multi fidelity DGP.

1\ Inbaseline_model_wrappers.py The HighFidelityGp model which would only model the highest fidelity, take as inputs X[1]and Y[1] which is correct for a model with two fidelities. However, for example if the Branin or the Hartmann_3d benchmarks are used which have 3 fidelities theX[1] and Y[1] would correspond to the second fidelity. So, i think it would be X[-1]andY[-1] as inputs for HighFidelityGp.

2\ In benchmarking_examples.ipynbthe inputs and outputs fidelities are given as lists which are later transformed using convert_xy_lists_to_arrays insidebaseline_model_wrappers.py which merges the lists and adds a column with discrete values to X to distinguish the fidelities . However, in the sampling used in benchmarking_examples.ipynb with latin.get_samples a column is already added to the X of each fidelity corresponding to the InformationSourceParameter of the function. So i think there is no need in this case for this parameter since the sampling is used for each fidelity within its own list. Actually we obtain X for two fidelities as:

X
[array([[ 0.79166667,  0.29166667,  1.        ],
       [ 0.95833333,  0.875     ,  1.        ],
       [ 0.70833333,  0.625     ,  0.        ],
       [ 0.125     ,  0.125     ,  0.        ],
       [ 0.29166667,  0.45833333,  0.        ],
       [ 0.45833333,  0.79166667,  1.        ],
       [ 0.20833333,  0.54166667,  1.        ],
       [ 0.375     ,  0.95833333,  0.        ],
       [ 0.875     ,  0.20833333,  1.        ],
       [ 0.04166667,  0.375     ,  0.        ],
       [ 0.54166667,  0.70833333,  0.        ],
       [ 0.625     ,  0.04166667,  1.        ]]), array([[ 0.5,  0.9,  0. ],
       [ 0.1,  0.7,  1. ],
       [ 0.9,  0.3,  0. ],
       [ 0.7,  0.5,  1. ],
       [ 0.3,  0.1,  0. ]])]

It seems to me that here the last column does not give any additional information. And the information about which fidelity is already in the list. Moreover, this can create modeling problem, because when using a model the function convert_xy_lists_to_arrays is used which will create the column corresponding to each fidelity hence having an X with four columns while the input dimension is only 2. Then the three first columns will be used for modeling the interactions within a fidelity (including the third one).

marpulli commented 5 years ago

Hi @Hebbalali,

Thanks for this! These are some bugs that crept in when I moved the code over to emukit. I have an updated version locally that I will submit a PR for very soon.

Mark

marpulli commented 5 years ago

This should be resolved by #225. Please feel free to reopen if you have any other questions.