Randomly initialize at 0 with normally distributed epsilon, say, eps=1e-5 standard deviation. See #58.
To tune hyperparameters: run SGD to get best estimates for a particular choice of hyperparameters. Then use this estimate as the starting point for running SGD for the next choice of hyperparameters during the grid search (continue to perturb with normally distributed epsilon error)
Randomly initialize at 0 with normally distributed epsilon, say,
eps=1e-5
standard deviation. See #58.To tune hyperparameters: run SGD to get best estimates for a particular choice of hyperparameters. Then use this estimate as the starting point for running SGD for the next choice of hyperparameters during the grid search (continue to perturb with normally distributed epsilon error)