[C++] implement polyak-ruppert averaging for gradient descent

C++'s gradient descent code currently does not support any kind of averaging.

We should implement polyak-ruppert averaging. This is already done in moe.optimal_learning.python.python_version.optimization.GradientDescentDescentOptimizer.optimize so porting it should be straightfoward.

This hasn't proven to be much of a hindrance insofar as the results obtained in Python with/without averaging have been comparable (i.e., the final gradient hasn't bee much better either way). Still we should be consistent and this averaging is generally a good idea.

Yelp / MOE

[C++] implement polyak-ruppert averaging for gradient descent #390