Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.3k stars 139 forks source link

next points hangs on precisely defined dimension #444

Open orasis opened 8 years ago

orasis commented 8 years ago

I'm attempt to constrain one of the dimensions to a single value, in this case 3752 as evidenced by the min and max of 3752. This query never seems to return.

Why would one want to do this? I don't want the GP to search those dimensions that are invariant.

The work around is to set the max to min+1. This is, however, not ideal because my domain_bounds dynamically change and I don't want to have to keep this edge case in mind.

INFO [moe.views.pretty_view][worker 1] {'content': {u'domain_info': {u'dim': 2, u'domain_bounds': [{u'max': 3752, u'min': 3752}, {u'max': 3600, u'min': 60}]}, u'gp_historical_info': {u'points_sampled': [{u'value_var': 0.01, u'value': 0.25, u'point': [3669, 1000]}, {u'value_var': 0.01, u'value': 0.25, u'point': [3669, 1000]}, {u'value_var': 0.01, u'value': 0.16666666666666666, u'point': [3752, 1000]}]}, u'num_to_sample': 1}, 'endpoint': 'gp_next_points_epi', 'type': 'request'}

jialeiwang commented 8 years ago

If you only search along one of the dimensions while the other is fixed, you should consider using 1-dimension GP instead of 2-d, because the dimension which is fixed is useless to the search problem, while adding a lot of "distraction" to the search problem.

orasis commented 8 years ago

Is that true? If we had 2 dimensions that are day of week and hour of day. Let's say on most days 8am is optimal, but on Saturday and Sunday 10am is optimal. What is the best way to search for this? I think you'd want the extra information from the day of week correlations even if you are just searching for the optimal time on Sunday. Maybe I misunderstood how the GP works internally.

Thanks!

On Wed, Oct 14, 2015 at 03:02 Jialei Wang notifications@github.com wrote:

If you only search along one of the dimensions while the other is fixed, you should consider using 1-dimension GP instead of 2-d, because the dimension which is fixed is useless to the search problem, while adding a lot of "distraction" to the search problem.

— Reply to this email directly or view it on GitHub https://github.com/Yelp/MOE/issues/444#issuecomment-147983432.

jialeiwang commented 8 years ago

Interesting, you are right. In this case you should use 2-d GP for a better modeling. The current implementation of MOE assumes the search dimension is the same as the problem dimension, and the goal of optimization is to find the maxima in this whole space. In your example, current implementation of MOE finds which day AND which time is the optimal, while your goal is to find which time on Sunday is optimal. You should do two tweaks to fit your problem:

  1. the best value sampled so far, which is used to compute ExpectedImprovement, in your case, is the best value sampled on Sunday, because your optimization objective is the value on Sunday;
  2. modify the optimization module to search the point along the dimension of time while the day is fixed to Sunday.

I am not sure when this new feature will be added, although it seems an important feature to pursue. If you like to play with the code instead of waiting, I can point out the starting places for you. Note that the python interface gives you greater flexibility to work on your problem, therefore, I assume you will use python interface, in particular, python_version instead of cpp_wrappers.

For point 1, modify the way of computing ExpectedImprovement._best_so_far in moe/optimal_learning/python/python_version/expected_improvement.py

Point 2 is trickier, GradientDescentOptimizer in moe/optimal_learning/python/python_version/optimization.py is a good starting point if you want to use gradient decent to optimize, or LBFGSOptimizer if you want to use BFGS. If you sample one point at a time, I recommend gradient decent, and BFGS if you sample multiple points at a time.

If you have difficulty in building the source on your local machine due to the C++ component, use this branch https://github.com/Yelp/MOE/tree/jwang_445_install_pure_python_modules_only You need to export env variable MOE_NO_BUILD_CPP=True, for this build, you cannot use web interface for now. Hope it helps!

orasis commented 8 years ago

Unfortunately my Python is weak, so it may be some time before I can be motivated.

I think this raises two separate issues. The original bug is still a bug because it is a trivial denial-of-service attack on the MOE REST interface.

The second issue is this contextual search optimization. The concrete target I would give for contextual search would be Q-Learning using a Guassian Process where the GP maps (state, action) pairs to the reward value (Q). Typically in the case, the action space is being searched while the state space is fixed. So your points would look like (stateX, stateY, stateZ, actionX, actionY) -> Q

suntzu86 commented 8 years ago

Sorry I've been gone for so long! See: #448 for details.

Whoa whoa. First, @jialeiwang, I don't think any of those modifications are necessary right now. This is definitely a bug. Keep reading for more details, a workaround, and a fix.

This is the notion of an "untunable parameter." MOE wasn't built with these in mind, but the basic premise is if I'm searching over x,y, then to search in a subspace where we already know y=2.5. For now, MOE would only support this by setting a 0-length domain boundary as @orasis did (keep reading for how). In the future, it'd be preferable to carry the notion of untunable-ness into the internals so we do not attempt to optimize untunables (not high priority b/c the cost is low).

Time is a great example of an untunable parameter. Material and environment properties are often untunable as well (like an airplane cannot pick the temperature at which it flies; I may want to design a structure with a new type of steel given some existing designs; etc).

Note that you still need to tune hyperparameters for your untunable dimensions. Additionally, say I have some untunables in [8, 12] and MOE estimates a length scale of 2.0. If I then set an untunable parameter with value 30, the minimum scaled distance is 9! This is huge and will cause MOE to revert to the prior (b/c the covariance will be nearly 0). So be careful when estimating very far away from previously sampled values; you might be getting random noise back.

@orasis, you have two options. Easy: Similar to what you tried, set your domain to [a, b] where b = a*(1.0 + 1.0e-14). So the domain width is TINY. This will do a lot better than setting b = a + 1, as for all intents and purposes, you're forcing MOE to always use a for that dimension.

Small caveat: You need to ensure that the hyperparameter length scale for this dimension is say > a*1.0e-11 (1000x the edge length). Then MOE knows there is no variation over the (tiny) domain edge width.

Harder: (I'll be doing this hopefully soon) The reason this happens is MOE's internal optimizer multistarts on a latin hypercube. This kernalizes finding random points on [min, max) (each pair of domain bounds). So your query would run [a,a) (a degenerate range). B/c we didn't start with C++11, this uses boost::uniform_real (in boost::random), which in turn says [a,a) is undefined behavior, and it infinite loops :(

C++11's equiv object, std::uniform_real_distribution (in <random>), does not do this. A random value in [a,a) returns a.

This happens in moe/optimal_learning/cpp/gpp_random.cpp. So add #include <random> at the top of that file (right above #include <vector>), and change line 184 to: std::shuffle(index_array.begin(), index_array.end(), uniform_generator->engine); and change line 186 to: std::uniform_real_distribution<double> uniform_double_distribution(0.0, subcube_edge_length);

This will cause some unit tests to fail b/c testing with randomness is hard, and the stdlib version and the boost version don't generate identical output (due to how uniform_real_distribution and uniform_real are implemented).

orasis commented 8 years ago

Excellent suggestions Eric. Thank you. On Tue, Nov 3, 2015 at 22:16 Eric Liu notifications@github.com wrote:

Sorry I've been gone for so long! See: #448 https://github.com/Yelp/MOE/issues/448 for details.

Whoa whoa. First, @jialeiwang https://github.com/jialeiwang, I don't think any of those modifications are necessary right now. This is definitely a bug. Keep reading for more details, a workaround, and a fix.

This is the notion of an "untunable parameter." MOE wasn't built with these in mind, but the basic premise is if I'm searching over x,y, then to search in a subspace where we already know y=2.5. For now, MOE would only support this by setting a 0-length domain boundary as @orasis https://github.com/orasis did (keep reading for how). In the future, it'd be preferable to carry the notion of untunable-ness into the internals so we do not attempt to optimize untunables (not high priority b/c the cost is low).

Time is a great example of an untunable parameter. Material and environment properties are often untunable as well (like an airplane cannot pick the temperature at which it flies; I may want to design a structure with a new type of steel given some existing designs; etc).

Note that you still need to tune hyperparameters for your untunable dimensions. Additionally, say I have some untunables in [8, 12] and MOE estimates a length scale of 2.0. If I then set an untunable parameter with value 30, the minimum scaled distance is 9! This is huge and will cause MOE to revert to the prior (b/c the covariance will be nearly 0). So be careful when estimating very far away from previously sampled values; you might be getting random noise back.

@orasis https://github.com/orasis, you have two options. Easy: Similar to what you tried, set your domain to [a, b] where b = a*(1.0 + 1.0e-14). So the domain width is TINY. This will do a lot better than setting b = a + 1, as for all intents and purposes, you're forcing MOE to always use a for that dimension.

Small caveat: You need to ensure that the hyperparameter length scale for this dimension is say > 1.0e-11 (1000x the edge length). Then MOE knows there is no variation over the (tiny) domain edge width.

Harder: (I'll be doing this hopefully soon) The reason this happens is MOE's internal optimizer multistarts on a latin hypercube. This kernalizes finding random points on [min, max) (each pair of domain bounds). So your query would run [a,a) (a degenerate range). B/c we didn't start with C++11, this uses boost::uniform_real (in boost::random), which in turn says [a,a) is undefined behavior, and it infinite loops :(

C++11's equiv object, std::uniform_real_distribution (in ), does not do this. A random value in [a,a) returns a.

This happens in moe/optimal_learning/cpp/gpp_random.cpp. So add

include

at the top of that file (right above #include ), and change line 184 to: std::shuffle(index_array.begin(), index_array.end(), uniform_generator->engine); and change line 186 to: std::uniform_real_distribution uniform_double_distribution(0.0, subcube_edge_length);

This will cause some unit tests to fail b/c testing with randomness is hard, and the stdlib version and the boost version don't generate identical output (due to how uniform_real_distribution and uniform_real are implemented).

— Reply to this email directly or view it on GitHub https://github.com/Yelp/MOE/issues/444#issuecomment-153574878.