Open dgreis opened 8 years ago
Sorry I didn't respond sooner... just started a new job so I've been pretty busy. Plus I don't have a working linux installation at the moment to test on, lol.
I believe Branin should work b/c we have tested with it before. Although while it is a very commonly considered case in academic papers, it never seemed particularly similar to our present MOE applications so we stopped focusing on it.
So this should work. That said, a few things stand out to me...
hyperparameter_domain
to gp_hyper_opt
. I'm actually really surprised this doesn't fail. Could you tell me what the result of this line is in your run?
https://github.com/Yelp/MOE/blob/master/moe/views/rest/gp_hyper_opt.py#L110I'm not entirely sure what domain it thinks it's getting. My only guess could be that the domain is coming out as [1.0, 1.0] for all hyperparameters. (Because the C++ accepts the domain in log10, so after exponentiating, 10 \ 0 = 1.) But I really don't know... that should have failed validation, sorry :(
Also, I didn't build or interact with the easyinterface stuff like at all, so please bear with me.
Lastly, you mean that doing somethign like this: https://github.com/Yelp/MOE/blob/master/moe_examples/next_point_via_simple_endpoint.py you were not able to update the Experiment object from easyinterface? That doesn't make any sense b/c that previous example is one of our integration tests which are all passing in the most recent revision.
Answering your q's:
1) All the JSON responses in my docker console have messages like "status" : "optimizer_success", so it seems the solver is working...
2) I'm running this from ipython notebook. My kernel always dies when I run:
from moe.views.rest import gp_hyper_opt
So I'm using
from moe.easy_interface.simple_endpoint import gp_hyper_opt
instead.
I still tried adding a print statement to moe/views/rest/gp_hyper_opt to get the value of hyperparameter_domain
but the call from simple_endpoint
doesn't seem to run thru that file. Or maybe I'm doing something wrong...
Nonetheless, I refactored the cycle_moe()
and grab_points_sampled(exp)
functions as follows, to include the hyperparameter_domain_info
:
def cycle_moe(exp, R=50):
rdf = pd.DataFrame({})
tdf = pd.DataFrame({})
for i in range(R):
bounds = get_bounds(exp)
if (i % 5 == 0):
points_sampled = grab_pts_sampled(exp)
t0 = time.time()
gp_dim = len(points_sampled[0][0])
hyper_dim = gp_dim + 1
covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103',**{'hyperparameter_domain_info' : { \
'dim': hyper_dim, \
'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim \
}})
print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
tdf[str(0)] = [time.time() - t0]
next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
f_next = branin(next_point) #+ npr.normal(scale=std)
rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next,
'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)
#points = exp.historical_data.to_list_of_sample_points()
new_point = SamplePoint(next_point, f_next, 0)
#exp = Experiment(bounds)
#points.append(new_point)
#exp.historical_data.append_sample_points(points)
exp.historical_data.append_sample_points([new_point])
return rdf[['x1','x2','f','cov_hyps']],tdf,exp,covariance_info
def grab_pts_sampled(exp):
pts = exp.historical_data.points_sampled
f_val = exp.historical_data.points_sampled_value
f_var = exp.historical_data.points_sampled_noise_variance
return [SamplePoint(list(pts[i,:]),f_val[i],f_var[i]) for i in range(len(pts))] #Changed this line
As a side note, it seems like I can actually add to experiments from the easy interface, so it doesn't seem like that is actually an issue anymore.
3) When I cycle this 50 times with 5 training points to start (just like I did in my first post), the hyperparameter values stick around [2.0, 2.0, 2.0] in rdf
. When I ran this for 100 iterations, some of the values did depart from 2.0, but the overall values are still not converging to any of the global minima.
optimizer_success: True
is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface
is just a wrapper around urllib2
to send POST and GET requests; it doesn't do any compute.
But that function is definitely being called. It's the only "view" that handles hyperparameter optimization.
[[0.01, 3], [0.03, 15], [0.03, 15]]
. max length scale of 2 is pretty small for a domain with length 15.python_version
to run somehow?
- optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.
I see things like:
'status': {'optimizer_success': {'log_marginal_likelihood_newton_found_update': True}
Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.
Not linux kernel. iPython Kernel. I don't think this is important because this isn't even where the code is executing; it's executing on Docker, as you pointed out.
Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.
I spent hours trying to print the value of hyperparamter_domain
in views/rest/gp_hyper_opt.py
(print to stdout, pass to json output, write to file) but all of them caused the build to fail tests and resulted in a bad (500 error) request answer from Docker. I'm out of ideas to debug this. Is knowing this absolutely necessary?
(as a side note, #401 turned out to be a blocker to do code edits easily. I commented there too)
Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.
When I said, "the overall values are still not converging to any of the global minima," I meant the overall values of the branin function.
I changed the line in my cycle_moe()
from
'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim
to:
'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]
This didn't seem to have any impact.
You're running the C++ right? Not forcing the stuff in python_version to run somehow?
I'm not sure how to verify whether I'm doing this or not, but I haven't explicitly or intentionally done this I don't think...
what does the scatter plot of points explored look like?
This is 50 iterations. For reference, the global optima of the branin are at: (-pi, 12.275), (pi, 2.275) and (9.42478, 2.475). It doesn't look like MOE is getting close to finding any of them. It's hard to understand the sequence of points explored in a scatter plot, so I tried to color the points so the earlier ones would be more red and the later ones more blue. If you run cycle_moe()
, it's much easier to see the progression just by looking at the columns for X1
and X2
in rdf
gp_ei
for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.Steps if interested: Copy the gp_mean_var
function in easy_interface to like gp_ei
. Change GP_MEAN_VAR_ROUTE_NAME
to GP_EI_ROUTE_NAME
. Change the return ...
line to return output.get('expected_improvement')
. And points_to_evaluate
is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]
. The schema for gp_ei
is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var:
https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py
The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.
Anyway, sorry I don't have a working install yet but I'll get there. Should have looked into gcc default version before upgrading to ubuntu 15.10. Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)
Also does this guy in moe_examples
folder work for you? Basically does the same thing as your script, I think...
https://github.com/Yelp/MOE/blob/master/moe_examples/combined_example.py
may want to disable the noise in function_to_minimize
. Looks like the test for the files in moe_examples don't verify correctness (just that they run).
Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.
There are multiple minima (all equal to 0.39) in the branin function see desc, so this is not entirely straightforward
Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.
These are "steps"... to what? I didn't understand what this was for, but I did it anyway. I couldn't add gp_mean_var
to views.rest.gp_ei.py
because of the same iPython-kernel-dying issue as before (let's leave that issue aside), but I was able to get it to work by keeping it in simple_endpoint.py
. I had to make a few more changes to the file, and I also changed the name so it wouldn't clash with the existing gp_mean_var
. Renamed to my_grab_ei
, it looks like this:
from moe.views.schemas.rest.gp_ei import GpEiResponse #Added
from moe.views.constant import GP_EI_ROUTE_NAME, GP_EI_PRETTY_ROUTE_NAME #Added
def my_grab_ei(
points_sampled,
points_to_evaluate,
rest_host=DEFAULT_HOST,
rest_port=DEFAULT_PORT,
testapp=None,
**kwargs
):
"""Hit the rest endpoint for calculating the posterior mean and variance of a gaussian process, given points already sampled."""
endpoint = ALL_REST_ROUTES_ROUTE_NAME_TO_ENDPOINT[GP_EI_ROUTE_NAME] ##Changed
raw_payload = kwargs.copy() # Any options can be set via the kwargs ('covariance_info' etc.)
raw_payload['points_to_evaluate'] = points_to_evaluate
# Sanitize input points
points_sampled_clean = [SamplePoint._make(point) for point in points_sampled]
historical_data = HistoricalData(
len(points_to_evaluate[0]), # The dim of the space
sample_points=points_sampled_clean,
)
if 'gp_historical_info' not in raw_payload:
raw_payload['gp_historical_info'] = historical_data.json_payload()
if 'domain_info' not in raw_payload:
raw_payload['domain_info'] = {'dim': len(points_to_evaluate[0])}
json_payload = json.dumps(raw_payload)
json_response = call_endpoint_with_payload(rest_host, rest_port, endpoint, json_payload, testapp)
output = GpEiResponse().deserialize(json_response) ##Changed
return output.get('expected_improvement') ##Changed
Then I added this to my cycle_moe
function, which now looks like this:
def cycle_moe(exp, R=50):
rdf = pd.DataFrame({})
tdf = pd.DataFrame({})
for i in range(R):
bounds = get_bounds(exp)
if (i % 5 == 0):
points_sampled = grab_pts_sampled(exp)
t0 = time.time()
gp_dim = len(points_sampled[0][0])
hyper_dim = gp_dim + 1
covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.99.100'
,**{'hyperparameter_domain_info' : { \
'dim': hyper_dim, \
'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}] \
#'domain_bounds': [{'min': 0.01, 'max': 2.0}] * hyper_dim
}}
)
print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
tdf[str(0)] = [time.time() - t0]
next_point = gp_next_points(exp,rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0]
f_next = branin(next_point) # + npr.normal(scale=std)
points_sampled = grab_pts_sampled(exp) ##Added
ei = my_grab_ei(points_sampled, [next_point],rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0] ##Added
rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next, 'ei': ei, ##Modified
'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)
#points = exp.historical_data.to_list_of_sample_points()
new_point = SamplePoint(next_point, f_next, 0)
exp = Experiment(bounds)
#points.append(new_point)
#exp.historical_data.append_sample_points(points)
exp.historical_data.append_sample_points([new_point])
return rdf[['x1','x2','f','ei','cov_hyps']],tdf,exp,covariance_info ##Modified
I don't know how best to summarize this. It will probably be best to run it yourself and look at the contents of rdf
, but here is the output from rdf.ei.describe()
count 50.000000
mean 47.673862
std 48.591592
min 2.522790
25% 10.211796
50% 22.794516
75% 85.673810
max 188.925347
Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)
I'm on OS X Yosemite (10.10.5 ) and gcc --version
returns 4.9.2. But I think you're right re: Docker handling this stuff...
Also, I got moe_examples
to work. The output is below. I guess it is converging to [1,2.6]? But still, I need to make sure it converges in all cases, not just this one.
If I can't turn the corner on this problem soon, I think I'm gonna have to abandon MOE :(
Sampled f([1.88891236067, 4.0]) = 3.024281515239389817E-01
Sampled f([0.0, 2.9618376054]) = -9.838875168897800449E-01
Sampled f([0.0, 3.82570246393]) = -7.749819391510606170E-01
Sampled f([0.786838585499, 2.62777516463]) = -1.579648346558538918E+00
Sampled f([1.20312582365, 2.08982756487]) = -1.451454321864257269E+00
Updated covariance_info with {'hyperparameters': [0.823360900509, 1.01663242597, 1.28659618664], 'covariance_type': u'square_exponential'}
Sampled f([0.758268089974, 2.2331219232]) = -1.411622052413112893E+00
Sampled f([1.19599259406, 2.57708248725]) = -1.593354050232974606E+00
Sampled f([1.03968887999, 2.56816139896]) = -1.617582421292888650E+00
Sampled f([2.0, 2.24190928674]) = -1.018767759754114710E+00
Sampled f([1.05579477819, 2.56319110027]) = -1.616923815055614444E+00
Updated covariance_info with {'hyperparameters': [0.789039406073, 1.23435905217, 1.30371483948], 'covariance_type': u'square_exponential'}
Sampled f([1.0082757694, 2.62978522016]) = -1.616789052180044095E+00
Sampled f([2.0, 0.639342249258]) = -1.468008111448600994E-01
Sampled f([1.07167325368, 2.5252282002]) = -1.614562666452685313E+00
Sampled f([0.863802755471, 2.92560868142]) = -1.540054847282929629E+00
Sampled f([1.03653075905, 2.56537496547]) = -1.617587842392238073E+00
Updated covariance_info with {'hyperparameters': [0.737969968335, 1.24276395575, 1.37894109572], 'covariance_type': u'square_exponential'}
Sampled f([1.01903375152, 2.59415144623]) = -1.617994034119857982E+00
Sampled f([1.03422357833, 2.56392785481]) = -1.617583817670746216E+00
Sampled f([1.03150566466, 2.56489031089]) = -1.617640394284993066E+00
Sampled f([1.02916662686, 2.56574681541]) = -1.617681716433355454E+00
Sampled f([1.02711746293, 2.56651917111]) = -1.617712335919329059E+00
GP mean at (0, 0), (0.1, 0.1), ...: [0.98974570661, 0.963411209612, 0.911656230847, 0.835017255388, 0.735036207222, 0.614126994898, 0.475415337389, 0.322579331855, 0.159711048117, -0.00879206630745]
I can understand having to abandon MOE... I'd be annoyed too if I were stuck for so long :( I was hoping to spend some time on this over the weekend but I ended up having to spend the weekend working on things for my job.
So I have a couple more suggestions for you; I'll try to debug it myself this week as hopefully work has settled down. Anyway...
points_to_evaluate
, so before updating historical_data
, query EI with coordinate list [[next_point], [optima_1], [optima_2], [optima_3]]
. And we could see if EI is better at next_point or at any of the true optima. This isn't critical though, it's more something for me to look at. Sorry about the confusion, but that's what I was thinking.Quick things to try:
gp_next_points
function, just like you pass in covariance_info
now): {"optimizer_info": {"optimizer_type": "gradient_descent_optimizer", "num_multistarts": 2000, "optimizer_parameters": {"gamma": 0.7, "pre_mult": 1.2, "max_relative_change": 0.5, "tolerance": 1.0e-7, "max_num_steps": 1200}}}
. Try "gamma": 0.5
too perhaps. (This may be a lot slower... if it's too slow, try decreasing num_multistarts
. The value you're currently using is 400.){"optimizer_info": {"optimizer_type": "null_optimizer", "num_random_samples": 10000000}}
. (You may have to put "optimizer_parameters": {}
in there too but I don't think so.) 10 million is a bit of a guess; if that's too slow, decrease it. If it's still super fast, try increasing it.Hopefully I spelled all those fields correctly, lol.
Edit:
Thank you for helping out @suntzu86 . Have you been able to work on this? I am having a similar problem.
Nope, I haven't had a chance to dive deeper into this.
A team at Cornell where some of the original research for MOE was done put a paper up on arxiv recently: http://arxiv.org/abs/1602.05149
They run a branin case successfully, code here: http://pastebin.com/2syieQKU
It's not going through the REST API though, but you can hopefully see how the python objects map into the api's json fields (like TensorProductDomain -> domain
).
I have been using MOE to tune parameters of a black box function. One thing I haven't been able to understand is why -- holding constant the number of historical points in an experiment -- consecutive calls to
gp_next_points()
will return points that seem to vary randomly within the X domain.After reading the EGO paper, this doesn't seem to make sense. As I understand it, the idea behind EI is that the suggested points at any given stage are supposed to be deterministic. That is, EI returns the X point where the GP thinks there will be the greatest improvement in the underlying objective function.
To get to the bottom of this, I've tried running MOE on a known function (the Branin function), removing all the usual sources of variation. Yet, I still see the randomness mentioned above. What's more, benchmarked against GPyOpt, MOE doesn't seem close to finding the answer even after 50 iterations.
Here's what I did. First, the imports outside of MOE:
Now, define the Branin Function:
Random points are generated in the domain described in the Branin description
Next, I add these points to a
moe.easy_interface.experiment.Experiment
. I included no variance at these points, as I wanted to see whether I could reproduce the problematic randomness by removing all input sources of variance. Variance can be added to these points with the same end result.Next, I cycle MOE for 50 iterations using the function defined below. I included a covariance re-tuning every 5 iterations (as suggested in the MOE FAQ). This function returns -- among other things -- a dataframe
rdf
that summarizes the suggested points at each iteration, as well as the function value at each of these points (with no variance, again in an effort to isolate the problematic randomness). Additionally, the function also returns the experiment itself\ and latest covariance_info, along with some metadata regarding time taken for each covariance tuning (tdf
)**Side Note: There is some weird bug where I wasn't able to add new points to an experiment after I send it to the REST server, so as a workaround, I would just add the points to a new experiment at each iteration.
This function had two helper functions:
and
After running
exp
throughcycle_moe()
for 50 iterations, when I make consecutive calls with the following line:I still see random variation in the next points suggested, which I don't understand. Additionally, when examining the results of each iteration in
rdf
, it doesn't look like MOE is getting close to finding one of the global optima in the Branin descriptionI ran the same number of iterations using GPyOpt on the same function, using the same method ('EI'), and it converged to one of the global optima.
Hopefully, with all my code and each step outlined above, somebody can help me figure out exactly what I'm doing wrong here. But up to now, this has been a serious head scratcher...