ExaScience / smurff

Bayesian Factorization with Side Information in C++ with Python wrapper
MIT License
70 stars 14 forks source link

performance issue with saved model #139

Closed nbosc closed 3 years ago

nbosc commented 3 years ago

Running smurff 0.16.1 on macOS

Is it normal that running a prediction from a saved model takes more than 3 min with my example while doing the same thing while running the training session takes 3 sec?

%%time
# Matrix Factorisation
session = smurff.TrainSession(
    priors = ['macau', 'normal'],
    num_latent = 32,
    burnin     = 40,
    nsamples   = 100,
    verbose    = 1,
    save_freq  = 5,
    save_name  = f"mf_test.hdf5",
    threshold=threshold,
    seed = 4321)

session.addTrainAndTest(sm_train, sm_test, smurff.ProbitNoise(threshold))
session.addSideInfo(0, fps, direct=True)
predictions = session.run()

CPU times: user 28.3 s, sys: 1.46 s, total: 29.7 s Wall time: 2.74 s

%%time
predictor = smurff.PredictSession(f"mf_test.hdf5")
predictions = predictor.predict_some(sm_test)

CPU times: user 3min 20s, sys: 13.5 s, total: 3min 34s Wall time: 3min 34s

PredictSession is not supposed to predict only the elements present in the sparse matrix?

tvandera commented 3 years ago

Hi,

indeed, prediction is still slow. Can you compare with predicting using predict_all

it's on my todo to have a look at this. Tom

nbosc commented 3 years ago

Super fast

%%time
predictor = smurff.PredictSession(f"mf_test.hdf5")
predictions = predictor.predict_all()

CPU times: user 84.9 ms, sys: 8.92 ms, total: 93.8 ms Wall time: 93.1 ms

tvandera commented 3 years ago

I have released smurff 0.17.0, with improved prediction speed for sparse matrices.

nbosc commented 3 years ago

Much better indeed!

%%time
predictor = smurff.PredictSession(f"mf_test.hdf5")
predictions = predictor.predict_sparse(sm_test)

CPU times: user 69.7 ms, sys: 6.82 ms, total: 76.6 ms Wall time: 75.1 ms