SMF.scala fails on provided SMF.learner method due to ADAGrad options not being initialized correctly

DanielTakeshi commented 7 years ago

Hello,

Here is a script which is taken almost exactly from the testsmf script: https://github.com/BIDData/BIDMach/blob/master/scripts/testsmf.ssc

I removed the prediction code to simplify, and explicitly put in the directories for the Netflix data. SMF.scala currently provides four learner methods:

learner, taking in (data matrix, dimension), uses Grad updater
learnerX, taking in (data matrix, dimension), uses no updater
learner, taking in (data matrix, user matrix, dimension), uses Grad upater
learnerX, taking in (data matrix, user matrix, dimension), uses no updater

The learnerX method with 3 inputs and no updater (which is provided in testsmf.ssc in the repository here) works (but doesn't succeed in reducing RMSE since there's no updater!).

The learner method with 3 inputs but with Grad updater (as shown in this minimal working example script) fails due to some ADAGrad values not being initialized.

:silent
import BIDMach.models.SMF

val a = loadSMat("/data/netflix/newtrain.smat.lz4")
val ta = loadSMat("/data/netflix/newtest.smat.lz4")
val d = 512 
val fact = zeros(d, a.ncols);
//val (nn,opts) = SMF.learnerX(a, fact, d) // this line works!
val (nn,opts) = SMF.learner(a, fact, d) // this line doesn't work

opts.batchSize = 1000
opts.uiter = 2 
// for acceptance = 1
opts.urate = 0.01f
opts.lrate = 0.01f  
// for computed acceptance
opts.urate = 0.1f
opts.lrate = 0.1  
opts.npasses = 5 

//opts.texp = null
//opts.aopts = null
val lambda = 4f
opts.lambdau = lambda;
opts.regumean = lambda;
opts.lambdam = lambda / 500000 * 20; 
opts.regmmean = opts.lambdam
opts.evalStep = 31
opts.doUsers = false
opts.lsgd = 0.010f
//opts.weightByUser = true
//opts.traceConverge = true
//opts.autoReset = false
//opts.useDouble = true

nn.train

Error message:

Loading /home/daniel/BIDMach/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, GND, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat, TMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{Click, FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest, SVD}
import BIDMach.networks.Net
import BIDMach.datasources.{DataSource, MatSource, FileSource, SFileSource}
import BIDMach.datasinks.{DataSink, MatSink}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, Grad, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
1 CUDA device found, CUDA version 8.0

Loading mwe_smf.ssc...
Switched off result printing.
pass= 0
java.lang.NullPointerException
  at BIDMach.models.SMF.mupdate(SMF.scala:191)
  at BIDMach.models.FactorModel.dobatch(FactorModel.scala:56)
  at BIDMach.models.Model.dobatchg(Model.scala:201)
  at BIDMach.Learner.nextPass(Learner.scala:145)
  at BIDMach.Learner.firstPass(Learner.scala:114)
  at BIDMach.Learner.retrain(Learner.scala:88)
  at BIDMach.Learner.train(Learner.scala:75)
  ... 54 elided

Line 191 here refers to:

uscale.set((lrate.dv * math.pow(step, - texp.dv)).toFloat);

Both lrate and texp are null even though they are generic options (shown in opts.what), though I'm not entirely sure why because the SMF code internally seems like it assigns those values to whatever the Grad.options would have it set to, and those would be non-null by default.

More generally, it might also be useful to update the SMF scripts to provide examples on how to use them with the current version of the code. I will continue investigating and looking at how SMF processes these values.

DanielTakeshi commented 7 years ago

Quick update: I got rid of the null pointer error by using

opts.aopts = opts

Though I am not sure if it is right, but it is what other scripts use. I will try this.

DanielTakeshi commented 7 years ago

There are still some problems I am facing. I've been looking at the code for SMF.scala and SFA.scala and tracing their executions. One thing I noticed is that the predictors will call the evalfun methods. For SFA.scala, the predictor calls one evalfun method which stores the result in the predictions matrix, i.e.:

preds.contents <-- xpreds.contents;

(Line 245 of SFA.scala)

However, SMF.scala has no such method like this, so it is impossible in general to call a prediction using it and then to save the output.

This means we need to be able to extract the second matrix factor (e.g. for netflix it's (d x 480k)-dimensional) ourselves to explicitly do the test. This seems to be what testsmf.ssc is doing.

However, it looks like we can't extract that matrix unless we provide it as input to the SMF learner in the first place (not the predictor, the model). The model matrices are different from the second matrix factor.

Thus, I recommend removing two of the four learner methods in SMF.scala which do not take a user matrix as input.

DanielTakeshi commented 7 years ago

UPDATE!

I think I have figured out how to get predictions working here. I will write a detailed pull request with the changes. The main idea is that we should add in an extra learner and predictor which can correctly update the user matrix internally in SMF.scala, following SFA.scala. Then in SMF.scala, we also need to add in another evalfun method which will store in the predictions matrix.

Basically, I'm going to make SMF.scala more like SFA.scala.

I think I have ADAGrad running on SMF.scala but the RMSE on netflix is roughly 0.90, whereas with SMF.scala I can get 0.85-ish. Let's talk later to see if this is a problem.

DanielTakeshi commented 7 years ago

Attempted solution in this pull request https://github.com/BIDData/BIDMach/pull/151

I guess this issue should be closed.

BIDData / BIDMach

SMF.scala fails on provided SMF.learner method due to ADAGrad options not being initialized correctly #149