btwardow / FactorizationMachines.jl

Factorization Machines for Julia
Other
11 stars 6 forks source link

SGD Improvements: Minibatch and Scheduling #3

Open Hydrotoast opened 8 years ago

Hydrotoast commented 8 years ago

SGD can use two simple improvements:

  1. Minibatch sampling
  2. Scheduling

Minibatch sampling

Minibatching can help control the convergence/training time tradeoff (may be wrong about the former claim).

Not entirely certain of an optimized implementation for this; however, here's a sketch of a naive implementation given a matrix of feature vectors X and an vector labels y.

number_of_samples = 10000
minibatch_size = 100 # or minibatch_fraction instead?
sample_indices = sample_without_replacement(number_of_samples, minibatch_size)
X_minibatch = X[:, sample_indices]
y_minibatch = y[sample_indices]

yhat = fmPredict(fm, minibatch, fSum, fSumSqr)
mult = fmLoss(yhat, y)
fmSGDUpdate!(fm, alpha, minibatch_indices, X_minibatch, mult, fSum)

Some changes to consider:

  1. Fixed minibatch size vs. minibatch fraction of the dataset?
  2. Minibatch sampling algorithm sample_without_replacement; could this be implemented in an efficient manner? Perhaps it would be more efficient to implement a sliding window over the dataset e.g.
X_minibatch = X[:, i:i + minibatch_size - 1]
y_minibatch = y[i:i + minibatch_size - 1]

However, the minibatches will not vary per epoch in this scenario.

Scheduling

Set the learning rate alpha to decrease proportional to the square root of the current iteration sqrt(iteration). To prevent division by zero: note that iteration > 0.

alpha = alpha0 / sqrt(iteration)