Closed Yoonho-Na closed 2 months ago
Hey @Yoonho-Na,
I think you might be a bit confused as to how the package works. mu
and std
are not statistical variables describing the data, but they're properties of the gaussian process at a specific point. You should not need to set them manually. What are you trying to do?
@till-m I'm trying to implement bayesian optimization with neural process as surrogate. I haven't implemented BO from scratch before so I think I'm little confused with it.
At first I tried to modify the code from your source but It was little difficult so I followed the scheme from this link instead.
# probability of improvement acquisition function
def acquisition(X, Xsamples, model):
# calculate the best surrogate score found so far
yhat, _ = surrogate(model, X)
best = max(yhat)
# calculate mean and stdev via surrogate function
mu, std = surrogate(model, Xsamples)
mu = mu[:, 0]
# calculate the probability of improvement
probs = norm.cdf((mu - best) / (std+1E-9))
return probs
# optimize the acquisition function
def opt_acquisition(X, y, model):
# random search, generate random samples
Xsamples = random(100)
Xsamples = Xsamples.reshape(len(Xsamples), 1)
# calculate the acquisition function for each sample
scores = acquisition(X, Xsamples, model)
# locate the index of the largest scores
ix = argmax(scores)
return Xsamples[ix, 0]
I also tried to find this kind of line in your code but I couldn't so I assumed that you are using some kind of method that gets X sample data.
but they're properties of the gaussian process at a specific point.
What do you mean by that specific point? I think this can be the clue. I'm currently using points that are kind of like this np.linspace(pbound[0], pbound[1], 100)
thank you for the reply
Hey @Yoonho-Na,
the way BO works is somewhat like this:
Can you pinpoint what you're struggling with exactly?
@till-m Don't we have to get the target distributions for each input parameter x, that covers the full pbound region when probing??
for example suppose that the task is finding the argmax value of sin(x), where pbound is 0 <= x <= pi. and what I did is, since the x is continuous and we cannot get every target distribution for continuous x, I discretized the possible input x into 101 bins just like below. [0, pi*1/100, pi*2/100, ..., pi] and by using the surrogate model y', we can obtain the target distribution for each input bins. So now that we have target distribution for each input bins so we can calculate acquisition function for each bins using cdf((mu - best) / (std+1E-9))
Is there anything that I'm misunderstanding?
I think my method is wrong because with this method the suggestion for next x is going to be restricted into the bins that I previously defined, if there are many parameters the bin size gets too much large because there can be so many combinations, and also the optimization is going to be highly differ with the number of bins.
Hey @Yoonho-Na,
Don't we have to get the target distributions for each input parameter x, that covers the full pbound region when probing?? Is there anything that I'm misunderstanding?
no, you don't have to evaluate every single parameter, that's not really feasible as you noticed. You can do what you're doing and just evaluate on a grid. It might be more prudent to use a random sampling method (uniform, latin hypercube or similar) to not be restricted to the gridpoints, but either way it's valid.
What this package does is a combination of random sampling and the newton method. See the code here (random sampling) and here (quasi-newton). The quasi-newton is much more expensive, but will usually lead to better results.
I hope that helps.
@till-m Oh now I understand it clearly. Could you answer just one more simple question? I'm not an expert in BO but my instinct tells me that evenly distributed evaluation points are better for probing correctly. Of course sufficient amount of sampling points can be fine but I just want to make sure.. So I'm thinking of trying grid-based approach like before but with adding small amount of gaussian noise to that evaluation grid points. Do you think this is valid or is it worthless?
evenly distributed
can mean many things. If you're talking about sampling the grid points, the problem is that you're restricting the solution space to these points. This becomes more relevant the more steps you have, if you sample multiple times, you will always see the same points. With random sampling this is not the case. if you're worried about covering the space unevenly, you can look into latin hypercube sampling or orthogonal sampling.
evenly distributed
can mean many things. If you're talking about sampling the grid points, the problem is that you're restricting the solution space to these points.
This is why I said adding gaussian noise to the grid points. so It can be almost evenly distributed grid but not having exact same points for every iterations.
So I'm thinking of trying grid-based approach like before but with adding small amount of gaussian noise to that evaluation grid points.
But I guess I'll just try using random sampling and quasi-newton at first like your advice.
hello I have question about setting the number of samples to get mu and std for calculating acquisition function. for example when we calculate acquisition function with Probability of Improvement, we use this kind of formula
cdf((mu-current_best)/std)
but I'm just curious about how densely do we need to set the x samples in the search space for calculating the mu and std. is there any guidelines for it or other method?