Closed DanielTakeshi closed 7 years ago
Quick update: I got rid of the null pointer error by using
opts.aopts = opts
Though I am not sure if it is right, but it is what other scripts use. I will try this.
There are still some problems I am facing. I've been looking at the code for SMF.scala
and SFA.scala
and tracing their executions. One thing I noticed is that the predictors will call the evalfun
methods. For SFA.scala
, the predictor calls one evalfun
method which stores the result in the predictions matrix, i.e.:
preds.contents <-- xpreds.contents;
(Line 245 of SFA.scala
)
However, SMF.scala
has no such method like this, so it is impossible in general to call a prediction using it and then to save the output.
This means we need to be able to extract the second matrix factor (e.g. for netflix it's (d x 480k)-dimensional) ourselves to explicitly do the test. This seems to be what testsmf.ssc
is doing.
However, it looks like we can't extract that matrix unless we provide it as input to the SMF learner in the first place (not the predictor, the model). The model matrices are different from the second matrix factor.
Thus, I recommend removing two of the four learner methods in SMF.scala
which do not take a user
matrix as input.
UPDATE!
I think I have figured out how to get predictions working here. I will write a detailed pull request with the changes. The main idea is that we should add in an extra learner and predictor which can correctly update the user
matrix internally in SMF.scala, following SFA.scala. Then in SMF.scala, we also need to add in another evalfun
method which will store in the predictions matrix.
Basically, I'm going to make SMF.scala more like SFA.scala.
I think I have ADAGrad running on SMF.scala but the RMSE on netflix is roughly 0.90, whereas with SMF.scala I can get 0.85-ish. Let's talk later to see if this is a problem.
Attempted solution in this pull request https://github.com/BIDData/BIDMach/pull/151
I guess this issue should be closed.
Hello,
Here is a script which is taken almost exactly from the testsmf script: https://github.com/BIDData/BIDMach/blob/master/scripts/testsmf.ssc
I removed the prediction code to simplify, and explicitly put in the directories for the Netflix data. SMF.scala currently provides four learner methods:
The learnerX method with 3 inputs and no updater (which is provided in testsmf.ssc in the repository here) works (but doesn't succeed in reducing RMSE since there's no updater!).
The learner method with 3 inputs but with Grad updater (as shown in this minimal working example script) fails due to some ADAGrad values not being initialized.
Error message:
Line 191 here refers to:
Both
lrate
andtexp
are null even though they are generic options (shown inopts.what
), though I'm not entirely sure why because the SMF code internally seems like it assigns those values to whatever the Grad.options would have it set to, and those would be non-null by default.More generally, it might also be useful to update the SMF scripts to provide examples on how to use them with the current version of the code. I will continue investigating and looking at how SMF processes these values.