btwardow / FactorizationMachines.jl

Factorization Machines for Julia
Other
11 stars 6 forks source link

Add Classification via Logistic Loss #2

Closed Hydrotoast closed 8 years ago

Hydrotoast commented 8 years ago

For comparisons with other implementations:

For prediction (during training):

For training loss function:

Both implementations are structurally the same and use enums to distinguish between classification and regression tasks.

A brief roadmap:

  1. Decide on how the API should change for training; the prediction API should remain the same. Simple proposal: add an enum parameter to decide whether regression or classification should be used.
  2. Implement the corresponding branches for the training.

Alternative API ideas:

  1. Separate training tasks: fm_train_sgd_regression(...) and fm_train_sgd_classification(...)
  2. OOP: make the training tasks objects: FMRegression and FMClassification that are constructed with their own training parameters and support the interface fm_train_sgd to produce a single FMModel. The FMModel will reference the training task that produces it.

Personally, I am in favor of the (2) approach since it leads to fewer branches in the code.

btwardow commented 8 years ago

Second approach is fine for me. Is somehow similar to what guys in fastFM did.

About the task objectives classes (FMRegression or FMClassification) - the intuition is telling me that those object should provide cost and gradient. The SGD should stay in one place, so if the new optimization schedule needs to be implemented (e.g. AdaGrad) - the same story. But since there is only sgd and right now the method is only prepared for this optim. - your solution sounds perfect 😄

Hydrotoast commented 8 years ago

@btwardow agreed. I'm still not entirely certain what the Julia community has decided on for a standard ML API (if it exists), but I do like the sklearn-like API that fastFM provides.

I've created a PR as promised. The SGD implementation has not been changed much, so it should be easy to implement AdaGrad as well. That could likely be the next PR.

btwardow commented 8 years ago

@Hydrotoast great! Is there a standard optimization package for Julia that have different learning rate schedules heuristic (adagrad, rmsprop, nmoment) for SGD implemented? However, from my experience - adagrad should be enough.

Hydrotoast commented 8 years ago

@btwardow SGDOptim is a package I found with a Google search. The core algorithm appears to be in this file: https://github.com/lindahua/SGDOptim.jl/blob/master/src/sgdstd.jl. If I am correct, the axpy method for performing gradient updates only performs dense updates.

I think you can gain much more performance by leveraging the sparsity of parameter updates for FMs. For example, if I have 1e6 people and 1e10 items to recommend, I can train in minibatches of say 10 (for the sake of simplicity. Suppose that each example consists of 1 person and 1 item. Then we may only use 20 vectors in V! This allows us to perform sparse updates to V an efficient manner (and sometimes parallel). A more concrete illustration of this idea can be seen in AdRoll's blog post.

btwardow commented 8 years ago

Ok. I will look at SGDOptim. But You are right, that probably we won't be able to use all the trick making FM faster.

Hydrotoast commented 8 years ago

Closed by #5