ExaScience / smurff

Bayesian Factorization with Side Information in C++ with Python wrapper
MIT License
70 stars 14 forks source link

Results are not the same when using save-restore #91

Closed motoharu-yano closed 6 years ago

motoharu-yano commented 6 years ago

When doing a full run for example with 4 sampling iterations vs 2 and 2 sampling iterations using save-restore functionality - results are not the same. Not sure what is the reason. Maybe rng? Or maybe we have to save/restore some state that we currently do not?

Here are commands that I have used:

run: --burnin=0 --nsamples=800 --num-latent=96 --init-model=zero --seed=0 --save-freq=1 --precision=5.0 --lambda-beta=10.0 --tol=1e-6 --train="E:\smurff_data\chembl_58\sample1\cluster1\train.mtx" --test="E:\smurff_data\chembl_58\sample1\cluster1\test.mtx" --prior=normal normal --side-info=none none --aux-data=none none --save-prefix="E:\smurff_build\lib\smurff-cpp\cmake\build\Debug\demo\demo" --save-extension=.ddm

continue: --root=E:\smurff_build\lib\smurff-cpp\cmake\build\Debug\demo\demo-root.ini

tvandera commented 6 years ago

Maybe ::update_prior is not executed after restore, as in:

https://github.com/ExaScience/smurff/blob/f8dfed1469443188567c89735a183940e81221de/lib/smurff-cpp/SmurffCpp/Priors/NormalPrior.cpp#L54-L57

This seems to be the case since this is printed after restore:

-- Restoring model, predictions,... from 'save-sample-2-step.ini'.
Continue from Sample   2/  5: RMSE: 0.2658 (1samp: 0.3547)  U:[1.66e+01, 7.02e+01] [took: 0.0s]
  RMSE train: 0.3547
  Priors:
     NormalPrior: mu = 0
     NormalPrior: mu = 0

while this is printed w/o restore:

Sample   2/  5: RMSE: 0.2658 (1samp: 0.3547)  U:[1.66e+01, 7.02e+01] [took: 2.8s]
  RMSE train: 0.3547
  Priors:
     NormalPrior: mu = 1.21128
     NormalPrior: mu = 0.677792
tvandera commented 6 years ago

Fixed in 1c32a1c

tvandera commented 6 years ago
-- Restoring model, predictions,... from 'save-sample-2-step.ini'.
Continue from Sample   2/  5: RMSE: 0.2658 (1samp: 0.3547)  U:[1.66e+01, 7.02e+01] [took: 0.0s]
  RMSE train: 0.3547
  Priors:
     NormalPrior: mu = 1.23447
     NormalPrior: mu = 0.678952
Sample   2/  5: RMSE: 0.2658 (1samp: 0.3547)  U:[1.66e+01, 7.02e+01] [took: 2.8s]
  RMSE train: 0.3547
  Priors:
     NormalPrior: mu = 1.21128
     NormalPrior: mu = 0.677792

Remaining differences probably due to random sampling.

tvandera commented 6 years ago

Also test with

tvandera commented 6 years ago

Remove call to ::init_Usum from ::restore, since it is called in ::init

tvandera commented 6 years ago

Since: dd89701dad8f324254350c57ce43d26d7877a684: All priors ok, except SpikeAndSlab

tvandera commented 6 years ago

Since 9c5f5de0b8456ddcc772a6ea20e443df82d610dc: