Closed dustinvtran closed 9 years ago
Just a note that there's active interest in this from the Stan people. Andrew is interested in an "R-hat" for approximate inference techniques in general, e.g., variational inference and expectation propagation. From some discussion with him, there's complications with the fact that the original technique does not immediately apply. One needs to take into account not only the variance between/within the iterations of SGD but also the variance of the SGD estimator itself. A more obvious problem is convergence at different local optima.
Marking closed as this is more generally in #18
run multiple "chains". do within sequence variance and between sequence variance to compute an R-hat for SGD. If not < 1.1, then it hasn't converged, e.g., there is something problematic with the choice of learning rate hyperparameters