Closed Hydrotoast closed 8 years ago
Second approach is fine for me. Is somehow similar to what guys in fastFM did.
About the task objectives classes (FMRegression
or FMClassification
) - the intuition is telling me that those object should provide cost and gradient. The SGD should stay in one place, so if the new optimization schedule needs to be implemented (e.g. AdaGrad) - the same story. But since there is only sgd and right now the method is only prepared for this optim. - your solution sounds perfect 😄
@btwardow agreed. I'm still not entirely certain what the Julia community has decided on for a standard ML API (if it exists), but I do like the sklearn-like API that fastFM provides.
I've created a PR as promised. The SGD implementation has not been changed much, so it should be easy to implement AdaGrad as well. That could likely be the next PR.
@Hydrotoast great! Is there a standard optimization package for Julia that have different learning rate schedules heuristic (adagrad, rmsprop, nmoment) for SGD implemented? However, from my experience - adagrad should be enough.
@btwardow SGDOptim is a package I found with a Google search. The core algorithm appears to be in this file: https://github.com/lindahua/SGDOptim.jl/blob/master/src/sgdstd.jl. If I am correct, the axpy
method for performing gradient updates only performs dense updates.
I think you can gain much more performance by leveraging the sparsity of parameter updates for FMs. For example, if I have 1e6 people and 1e10 items to recommend, I can train in minibatches of say 10 (for the sake of simplicity. Suppose that each example consists of 1 person and 1 item. Then we may only use 20 vectors in V
! This allows us to perform sparse updates to V
an efficient manner (and sometimes parallel). A more concrete illustration of this idea can be seen in AdRoll's blog post.
Ok. I will look at SGDOptim. But You are right, that probably we won't be able to use all the trick making FM faster.
Closed by #5
For comparisons with other implementations:
For prediction (during training):
For training loss function:
Both implementations are structurally the same and use enums to distinguish between classification and regression tasks.
task = 0
implies regressiontask = 1
implies classificationA brief roadmap:
Alternative API ideas:
fm_train_sgd_regression(...)
andfm_train_sgd_classification(...)
FMRegression
andFMClassification
that are constructed with their own training parameters and support the interfacefm_train_sgd
to produce a singleFMModel
. TheFMModel
will reference the training task that produces it.Personally, I am in favor of the (2) approach since it leads to fewer branches in the code.