Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.31k stars 140 forks source link

Unknown Source of Variation in `gp_next_points()` #452

Open dgreis opened 8 years ago

dgreis commented 8 years ago

I have been using MOE to tune parameters of a black box function. One thing I haven't been able to understand is why -- holding constant the number of historical points in an experiment -- consecutive calls to gp_next_points() will return points that seem to vary randomly within the X domain.

After reading the EGO paper, this doesn't seem to make sense. As I understand it, the idea behind EI is that the suggested points at any given stage are supposed to be deterministic. That is, EI returns the X point where the GP thinks there will be the greatest improvement in the underlying objective function.

To get to the bottom of this, I've tried running MOE on a known function (the Branin function), removing all the usual sources of variation. Yet, I still see the randomness mentioned above. What's more, benchmarked against GPyOpt, MOE doesn't seem close to finding the answer even after 50 iterations.

Here's what I did. First, the imports outside of MOE:

import numpy as np
import numpy.random as npr
import math
import pandas as pd
import time

Now, define the Branin Function:

def branin(x):
    x1 = x[0]
    x2 = x[1]
    a = 1.
    b = 5.1 / (4.*np.pi**2)
    c = 5. / np.pi
    r = 6.
    s = 10.
    t = 1. / (8.*np.pi)
    ret  = a*(x2-b*x1**2+c*x1-r)**2+s*(1-t)*np.cos(x1)+s
    #print ret
    return ret

Random points are generated in the domain described in the Branin description

x1 = 15*npr.random_sample(5) - 5
x2 = 15*npr.random_sample(5)
data = np.vstack([x1,x2]).T
fval = [branin(x)  for x in data]

Next, I add these points to a moe.easy_interface.experiment.Experiment. I included no variance at these points, as I wanted to see whether I could reproduce the problematic randomness by removing all input sources of variance. Variance can be added to these points with the same end result.

exp = Experiment([[-5,10],[0,15]])
exp.historical_data.append_sample_points([
    SamplePoint(data[0], fval[0], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[1], fval[1], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[2], fval[2],0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[3], fval[3], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[4], fval[4], 0)])

Next, I cycle MOE for 50 iterations using the function defined below. I included a covariance re-tuning every 5 iterations (as suggested in the MOE FAQ). This function returns -- among other things -- a dataframe rdf that summarizes the suggested points at each iteration, as well as the function value at each of these points (with no variance, again in an effort to isolate the problematic randomness). Additionally, the function also returns the experiment itself\ and latest covariance_info, along with some metadata regarding time taken for each covariance tuning (tdf)

**Side Note: There is some weird bug where I wasn't able to add new points to an experiment after I send it to the REST server, so as a workaround, I would just add the points to a new experiment at each iteration.

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103')
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) #+  npr.normal(scale=std)
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next},ignore_index=True)

        points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        exp = Experiment(bounds)
        points.append(new_point)
        exp.historical_data.append_sample_points(points)
    return rdf,tdf,exp,covariance_info

This function had two helper functions:

def get_bounds(exp):
    num_dims = exp.domain.dim
    x_intls = [exp.domain.get_bounding_box()[i] for i in range(num_dims)]
    intl_list = list()
    for intl in x_intls:
        intl_list.append([intl.min, intl.max])
    return intl_list

and

def grab_pts_sampled(exp):
    pts = exp.historical_data.points_sampled
    f_val = exp.historical_data.points_sampled_value
    f_var = exp.historical_data.points_sampled_noise_variance
    return [[list(pts[i,:]),f_val[i],f_var[i]] for i in range(len(pts))]

After running exp through cycle_moe() for 50 iterations, when I make consecutive calls with the following line:

gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]

I still see random variation in the next points suggested, which I don't understand. Additionally, when examining the results of each iteration in rdf, it doesn't look like MOE is getting close to finding one of the global optima in the Branin description

I ran the same number of iterations using GPyOpt on the same function, using the same method ('EI'), and it converged to one of the global optima.

Hopefully, with all my code and each step outlined above, somebody can help me figure out exactly what I'm doing wrong here. But up to now, this has been a serious head scratcher...

suntzu86 commented 8 years ago

Sorry I didn't respond sooner... just started a new job so I've been pretty busy. Plus I don't have a working linux installation at the moment to test on, lol.

I believe Branin should work b/c we have tested with it before. Although while it is a very commonly considered case in academic papers, it never seemed particularly similar to our present MOE applications so we stopped focusing on it.

So this should work. That said, a few things stand out to me...

  1. All the http endpoints return status messages (like indicating whether the solver converged/found an update). Are these coming back as true? (If not, we just return a random value or best guess but it is not trustworthy.)
  2. You don't pass a hyperparameter_domain to gp_hyper_opt. I'm actually really surprised this doesn't fail. Could you tell me what the result of this line is in your run? https://github.com/Yelp/MOE/blob/master/moe/views/rest/gp_hyper_opt.py#L110

I'm not entirely sure what domain it thinks it's getting. My only guess could be that the domain is coming out as [1.0, 1.0] for all hyperparameters. (Because the C++ accepts the domain in log10, so after exponentiating, 10 \ 0 = 1.) But I really don't know... that should have failed validation, sorry :(

  1. Following 2, can you print out the covariance hyperparams you're getting? Are they all 1.0?

Also, I didn't build or interact with the easyinterface stuff like at all, so please bear with me.

Lastly, you mean that doing somethign like this: https://github.com/Yelp/MOE/blob/master/moe_examples/next_point_via_simple_endpoint.py you were not able to update the Experiment object from easyinterface? That doesn't make any sense b/c that previous example is one of our integration tests which are all passing in the most recent revision.

dgreis commented 8 years ago

Answering your q's:

1) All the JSON responses in my docker console have messages like "status" : "optimizer_success", so it seems the solver is working...

2) I'm running this from ipython notebook. My kernel always dies when I run: from moe.views.rest import gp_hyper_opt So I'm using from moe.easy_interface.simple_endpoint import gp_hyper_opt instead.

I still tried adding a print statement to moe/views/rest/gp_hyper_opt to get the value of hyperparameter_domain but the call from simple_endpoint doesn't seem to run thru that file. Or maybe I'm doing something wrong...

Nonetheless, I refactored the cycle_moe() and grab_points_sampled(exp) functions as follows, to include the hyperparameter_domain_info:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103',**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim  \
                }})
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) #+  npr.normal(scale=std)
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next,
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)

        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        #exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','cov_hyps']],tdf,exp,covariance_info

def grab_pts_sampled(exp):
    pts = exp.historical_data.points_sampled
    f_val = exp.historical_data.points_sampled_value
    f_var = exp.historical_data.points_sampled_noise_variance
    return [SamplePoint(list(pts[i,:]),f_val[i],f_var[i]) for i in range(len(pts))] #Changed this line

As a side note, it seems like I can actually add to experiments from the easy interface, so it doesn't seem like that is actually an issue anymore.

3) When I cycle this 50 times with 5 training points to start (just like I did in my first post), the hyperparameter values stick around [2.0, 2.0, 2.0] in rdf. When I ran this for 100 iterations, some of the values did depart from 2.0, but the overall values are still not converging to any of the global minima.

suntzu86 commented 8 years ago
  1. optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.
  2. Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

But that function is definitely being called. It's the only "view" that handles hyperparameter optimization.

  1. Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.
  2. You're running the C++ right? Not forcing the stuff in python_version to run somehow?
  3. what does the scatter plot of points explored look like?
dgreis commented 8 years ago
  1. optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.

I see things like:

'status': {'optimizer_success': {'log_marginal_likelihood_newton_found_update': True}

Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Not linux kernel. iPython Kernel. I don't think this is important because this isn't even where the code is executing; it's executing on Docker, as you pointed out.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

I spent hours trying to print the value of hyperparamter_domain in views/rest/gp_hyper_opt.py (print to stdout, pass to json output, write to file) but all of them caused the build to fail tests and resulted in a bad (500 error) request answer from Docker. I'm out of ideas to debug this. Is knowing this absolutely necessary?

(as a side note, #401 turned out to be a blocker to do code edits easily. I commented there too)

Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.

When I said, "the overall values are still not converging to any of the global minima," I meant the overall values of the branin function.

I changed the line in my cycle_moe() from

'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim

to:

'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]

This didn't seem to have any impact.

You're running the C++ right? Not forcing the stuff in python_version to run somehow?

I'm not sure how to verify whether I'm doing this or not, but I haven't explicitly or intentionally done this I don't think...

what does the scatter plot of points explored look like?

moe_points_explored_branin

This is 50 iterations. For reference, the global optima of the branin are at: (-pi, 12.275), (pi, 2.275) and (9.42478, 2.475). It doesn't look like MOE is getting close to finding any of them. It's hard to understand the sequence of points explored in a scatter plot, so I tried to color the points so the earlier ones would be more red and the later ones more blue. If you run cycle_moe(), it's much easier to see the progression just by looking at the columns for X1 and X2 in rdf

suntzu86 commented 8 years ago

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

Anyway, sorry I don't have a working install yet but I'll get there. Should have looked into gcc default version before upgrading to ubuntu 15.10. Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

suntzu86 commented 8 years ago

Also does this guy in moe_examples folder work for you? Basically does the same thing as your script, I think...

https://github.com/Yelp/MOE/blob/master/moe_examples/combined_example.py

may want to disable the noise in function_to_minimize. Looks like the test for the files in moe_examples don't verify correctness (just that they run).

dgreis commented 8 years ago

Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.

There are multiple minima (all equal to 0.39) in the branin function see desc, so this is not entirely straightforward

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

These are "steps"... to what? I didn't understand what this was for, but I did it anyway. I couldn't add gp_mean_var to views.rest.gp_ei.py because of the same iPython-kernel-dying issue as before (let's leave that issue aside), but I was able to get it to work by keeping it in simple_endpoint.py. I had to make a few more changes to the file, and I also changed the name so it wouldn't clash with the existing gp_mean_var. Renamed to my_grab_ei, it looks like this:

from moe.views.schemas.rest.gp_ei import GpEiResponse #Added
from moe.views.constant import GP_EI_ROUTE_NAME, GP_EI_PRETTY_ROUTE_NAME #Added

def my_grab_ei(
        points_sampled,
        points_to_evaluate,
        rest_host=DEFAULT_HOST,
        rest_port=DEFAULT_PORT,
        testapp=None,
        **kwargs
):
    """Hit the rest endpoint for calculating the posterior mean and variance of a gaussian process, given points already sampled."""
    endpoint = ALL_REST_ROUTES_ROUTE_NAME_TO_ENDPOINT[GP_EI_ROUTE_NAME] ##Changed
    raw_payload = kwargs.copy()  # Any options can be set via the kwargs ('covariance_info' etc.)

    raw_payload['points_to_evaluate'] = points_to_evaluate

    # Sanitize input points
    points_sampled_clean = [SamplePoint._make(point) for point in points_sampled]
    historical_data = HistoricalData(
            len(points_to_evaluate[0]),  # The dim of the space
            sample_points=points_sampled_clean,
            )

    if 'gp_historical_info' not in raw_payload:
        raw_payload['gp_historical_info'] = historical_data.json_payload()

    if 'domain_info' not in raw_payload:
        raw_payload['domain_info'] = {'dim': len(points_to_evaluate[0])}

    json_payload = json.dumps(raw_payload)

    json_response = call_endpoint_with_payload(rest_host, rest_port, endpoint, json_payload, testapp)

    output = GpEiResponse().deserialize(json_response) ##Changed

    return output.get('expected_improvement') ##Changed

Then I added this to my cycle_moe function, which now looks like this:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.99.100'
                ,**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]  \
                #'domain_bounds': [{'min': 0.01, 'max': 2.0}] * hyper_dim                                                                                                      
                }}
                                          )
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) # +  npr.normal(scale=std)
        points_sampled = grab_pts_sampled(exp)  ##Added
        ei = my_grab_ei(points_sampled, [next_point],rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0] ##Added
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next, 'ei': ei,   ##Modified
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)
        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','ei','cov_hyps']],tdf,exp,covariance_info ##Modified

I don't know how best to summarize this. It will probably be best to run it yourself and look at the contents of rdf, but here is the output from rdf.ei.describe()

count     50.000000
mean      47.673862
std       48.591592
min        2.522790
25%       10.211796
50%       22.794516
75%       85.673810
max      188.925347

Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

I'm on OS X Yosemite (10.10.5 ) and gcc --version returns 4.9.2. But I think you're right re: Docker handling this stuff...

Also, I got moe_examples to work. The output is below. I guess it is converging to [1,2.6]? But still, I need to make sure it converges in all cases, not just this one.

If I can't turn the corner on this problem soon, I think I'm gonna have to abandon MOE :(

Sampled f([1.88891236067, 4.0]) = 3.024281515239389817E-01
Sampled f([0.0, 2.9618376054]) = -9.838875168897800449E-01
Sampled f([0.0, 3.82570246393]) = -7.749819391510606170E-01
Sampled f([0.786838585499, 2.62777516463]) = -1.579648346558538918E+00
Sampled f([1.20312582365, 2.08982756487]) = -1.451454321864257269E+00
Updated covariance_info with {'hyperparameters': [0.823360900509, 1.01663242597, 1.28659618664], 'covariance_type': u'square_exponential'}
Sampled f([0.758268089974, 2.2331219232]) = -1.411622052413112893E+00
Sampled f([1.19599259406, 2.57708248725]) = -1.593354050232974606E+00
Sampled f([1.03968887999, 2.56816139896]) = -1.617582421292888650E+00
Sampled f([2.0, 2.24190928674]) = -1.018767759754114710E+00
Sampled f([1.05579477819, 2.56319110027]) = -1.616923815055614444E+00
Updated covariance_info with {'hyperparameters': [0.789039406073, 1.23435905217, 1.30371483948], 'covariance_type': u'square_exponential'}
Sampled f([1.0082757694, 2.62978522016]) = -1.616789052180044095E+00
Sampled f([2.0, 0.639342249258]) = -1.468008111448600994E-01
Sampled f([1.07167325368, 2.5252282002]) = -1.614562666452685313E+00
Sampled f([0.863802755471, 2.92560868142]) = -1.540054847282929629E+00
Sampled f([1.03653075905, 2.56537496547]) = -1.617587842392238073E+00
Updated covariance_info with {'hyperparameters': [0.737969968335, 1.24276395575, 1.37894109572], 'covariance_type': u'square_exponential'}
Sampled f([1.01903375152, 2.59415144623]) = -1.617994034119857982E+00
Sampled f([1.03422357833, 2.56392785481]) = -1.617583817670746216E+00
Sampled f([1.03150566466, 2.56489031089]) = -1.617640394284993066E+00
Sampled f([1.02916662686, 2.56574681541]) = -1.617681716433355454E+00
Sampled f([1.02711746293, 2.56651917111]) = -1.617712335919329059E+00
GP mean at (0, 0), (0.1, 0.1), ...: [0.98974570661, 0.963411209612, 0.911656230847, 0.835017255388, 0.735036207222, 0.614126994898, 0.475415337389, 0.322579331855, 0.159711048117, -0.00879206630745]
suntzu86 commented 8 years ago

I can understand having to abandon MOE... I'd be annoyed too if I were stuck for so long :( I was hoping to spend some time on this over the weekend but I ended up having to spend the weekend working on things for my job.

So I have a couple more suggestions for you; I'll try to debug it myself this week as hopefully work has settled down. Anyway...

Quick things to try:

Hopefully I spelled all those fields correctly, lol.

Edit:

johnroach commented 8 years ago

Thank you for helping out @suntzu86 . Have you been able to work on this? I am having a similar problem.

suntzu86 commented 8 years ago

Nope, I haven't had a chance to dive deeper into this.

A team at Cornell where some of the original research for MOE was done put a paper up on arxiv recently: http://arxiv.org/abs/1602.05149

They run a branin case successfully, code here: http://pastebin.com/2syieQKU

It's not going through the REST API though, but you can hopefully see how the python objects map into the api's json fields (like TensorProductDomain -> domain).