ExaScience / smurff

Bayesian Factorization with Side Information in C++ with Python wrapper
MIT License
70 stars 14 forks source link

Macau with binary matrix #124

Closed nbosc closed 5 years ago

nbosc commented 5 years ago

Is it possible to use Macau on binary data with side information?

I tried to combine the macau example with the smurff example on binary matrices but it fails. Macau does not have a TrainSession method.

tvandera commented 5 years ago

Try this:

ic50_threshold = 6.
session = smurff.TrainSession(
                            priors = ['macau', 'normal'],
                            num_latent=32,
                            burnin=100,
                            nsamples=100,
                            # Using threshold of 6. to calculate AUC on test data
                            threshold=ic50_threshold)

## using activity threshold pIC50 > 6. to binarize train data
session.addTrainAndTest(ic50_train, ic50_test, smurff.ProbitNoise(ic50_threshold))
session.addSideinfo(0, ecfp)
predictions = session.run()
print("RMSE = %.2f" % smurff.calc_rmse(predictions))
print("AUC = %.2f" % smurff.calc_auc(predictions, ic50_threshold))

Full Python API documentation is avaiable here: https://smurff.readthedocs.io/en/latest/python_api.html

nbosc commented 5 years ago

Thanks, it does what I was looking for!