UCB1 and UCB1-tuned algorithm have sample_arm.total and number_sampled parameters reversed

In the UCB1 and UCB1-tuned algorithm implementations, the calculation of the upper confidence bound should include the term ( sqrt( 2.0 * ( ln (bandit_trials ) / arm_trials ) ).

The implementation here flips (bandit_trials) and (arm_trials) in the calculation of this term. The risk here is that arms without trials (or arms with very few pulls) aren't reevaluated as often, and the bandit has a higher risk of selecting a non-optimal arm early on.

https://github.com/Yelp/MOE/blob/master/moe/bandit/ucb/ucb1.py#L64-L65 https://github.com/Yelp/MOE/blob/master/moe/bandit/ucb/ucb1_tuned.py#L78-L79 (sorry, not sure how to insert a gist into this issue)

Reference: http://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf, pages 3 & 11.

The algorithm is also stated correctly in the docstrings under the method, just implemented wrong. Is there a specific reason for this?

I have a pull request + test ready to go if this is a genuine issue.

Yelp / MOE

UCB1 and UCB1-tuned algorithm have sample_arm.total and number_sampled parameters reversed #432