Yelp / MOE

A global, black box optimization engine for real world metric optimization.
Other
1.3k stars 139 forks source link

UCB1 and UCB1-tuned algorithm have sample_arm.total and number_sampled parameters reversed #432

Closed serchaos closed 9 years ago

serchaos commented 9 years ago

In the UCB1 and UCB1-tuned algorithm implementations, the calculation of the upper confidence bound should include the term ( sqrt( 2.0 * ( ln (bandit_trials ) / arm_trials ) ).

The implementation here flips (bandit_trials) and (arm_trials) in the calculation of this term. The risk here is that arms without trials (or arms with very few pulls) aren't reevaluated as often, and the bandit has a higher risk of selecting a non-optimal arm early on.

https://github.com/Yelp/MOE/blob/master/moe/bandit/ucb/ucb1.py#L64-L65 https://github.com/Yelp/MOE/blob/master/moe/bandit/ucb/ucb1_tuned.py#L78-L79 (sorry, not sure how to insert a gist into this issue)

Reference: http://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf, pages 3 & 11.

The algorithm is also stated correctly in the docstrings under the method, just implemented wrong. Is there a specific reason for this?

I have a pull request + test ready to go if this is a genuine issue.

sc932 commented 9 years ago

I think this is a bug, good catch. I'd be happy to pull in your tested branch, thanks!