airoldilab / sgd

An R package for large scale estimation with stochastic gradient descent
62 stars 18 forks source link

R-hat for SGD #74

Closed dustinvtran closed 9 years ago

dustinvtran commented 9 years ago

run multiple "chains". do within sequence variance and between sequence variance to compute an R-hat for SGD. If not < 1.1, then it hasn't converged, e.g., there is something problematic with the choice of learning rate hyperparameters

dustinvtran commented 9 years ago

Just a note that there's active interest in this from the Stan people. Andrew is interested in an "R-hat" for approximate inference techniques in general, e.g., variational inference and expectation propagation. From some discussion with him, there's complications with the fact that the original technique does not immediately apply. One needs to take into account not only the variance between/within the iterations of SGD but also the variance of the SGD estimator itself. A more obvious problem is convergence at different local optima.

dustinvtran commented 9 years ago

Marking closed as this is more generally in #18