Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.31k stars 139 forks source link

Time varying objective functions #414

Open filthysocks opened 9 years ago

filthysocks commented 9 years ago

Hello,

I stumbled upon your project because I basically wrote the same thing (but much simpler) at my company. I am tempted to use your project instead, but there is one think I couldn't figure out.

How do you deal with time varying objective functions? We use our framework to optimize CTRs, but its highly time depended (e.g. much higher during weekends or nights). We solved this by adding the time as a regressor to the GP model and then marginalize over the them during the EI calculation. Is there a way to do this with MOE?

sc932 commented 9 years ago

Hey @filthysocks,

We currently deal with time varying objective functions by explicitly modeling it in the objective (having a time decay parameter that takes it back to the prior). The solution you proposed is the right way to do this and is on our roadmap for MOE.

Depending on how far along your solution is, do you think it would be possible to merge it into MOE? This is something we are really interested in adding to MOE and we would be happy to accept the PR if you would like to add it.

filthysocks commented 9 years ago

I'll have a look how complicated that would be.

filthysocks commented 9 years ago

Even though our frameworks are not much alike (we assumed all variables are categorical, this makes our lives much much easier), i don't think its too complicated to implement.

The most important difference is that a prior over the time varying parameter needs to be specified. There are 2 options:

  1. Estimating it from the data (kernel density estimation or categorical distribution depending on the data type)
  2. Explicitly supplying a prior (offering a couple of options e.g. uniform, log-uniform, normal, log-normal)

We used the first approach. Though while using it now, I realized that in all cases I could have also supplied a prior. So now I think option 2 is more useful.

On the implementation side, the easiest solution is to use the prior to draw samples from the parameter and perform MC sampling, to average out the time varying parameter from the EI. If option 2 is chosen (depending on the supplied prior options) an analytic solution is possible.

We definitely want to implement it, but we try to get a master student to work on it. So it might take while.

KukumavMozolo commented 8 years ago

So i tried out a few thinks and what i ended up with was integrating over the gaussian process in order to find a stationary solution. More detailed i added the time as an additional feature, then integrated the variance and predictive mean funktion. EI is then computed on basis of this "integrated" gaussian process. I used python and the code is optimized for numpy. Are you interested?

suntzu86 commented 8 years ago

That's an interesting approach. I'd be curious to take a look, @KukumavMozolo

KukumavMozolo commented 8 years ago

Nice, there is a branch called MOE_SERVER that exposes a rest endpoint for the integrated gp under /gp/next_points/iepi/pretty, i also added gp/optimal_point/pretty an endpoint that returns the minimum of the integrated gp since you can not just use the best so far point as best point anymore.

KukumavMozolo commented 8 years ago

{"domain_info": {"dim": 3, "domain_bounds": [{"max": 1.0, "min": 0.0}, {"max": 2.0, "min": -2.0}, {"max": 2.0, "min": -2.0}]}, "gp_historical_info": {"points_sampled": [{"value_var": 0.01, "value": 0.1, "point": [0.0, 1.0, 0.5]}, {"value_var": 0.01, "value": 0.2, "point": [1.0, 1.0, 0.2]}, {"value_var": 0.01, "value": 0.2, "point": [0.391299679181, 0.5, 1]}]}, "gp_integral_info": {"lower_bound": [-2, -2], "marginal_idx": [1, 2], "upper_bound": [2, 2]}, "num_to_sample": 1, "covariance_info": {"hyperparameters": ["0.8474810600969764", "2.1667688343515916", "2.5984916973124617", "2.5984916973124617"], "covariance_type": "square_exponential"}}

Where I added "gp_integral_info": {"lower_bound": [-2, -2], "marginal_idx": [1, 2], "upper_bound": [2, 2]}, that means: integrate over dimension 1 and 2 from -2 to 2 each. I did not change the dimension of the returned point, thus in this example if the algorithm returned [0.0, 1.0, 0.5] 1.0 and 0.5 would be random numbers. All my analysis about performance was about integrating just over one dimension, up to now i don't really know how the algorithm performs when integrating over multiple dimensions as in the example. Everything is implemented in integrated_gaussian_process.py. Also I found that there are conditions where one might want to use a different selection process for the current best point for EI calculation. Instead of taking the global best point one can take the best at time t (where t corresponds time point we want to sample next) or the average over the best points. These strategy’s are implemented for cases where sampling times are fixed, but they cant be accessed through the rest endpoint atm. @suntzu86