ExaScience / smurff

Bayesian Factorization with Side Information in C++ with Python wrapper
MIT License
70 stars 14 forks source link

Bug in out-of-matrix prediction #120

Closed tvandera closed 5 years ago

tvandera commented 5 years ago

As currently implemented in predict.py: https://github.com/ExaScience/smurff/blob/56487ec54a3e5b6db39403ed95caa4421a7e94b9/python/smurff/smurff/predict.py#L106

out-of-matrix prediction uses the mean of all latent vectors (Umean), plus the contribution of the side info (c.dot(self.betas[m].transpose())) to compute a latent representation for a row/column outside of the train matrix.

But Umean already contains the sideinfo contribution: https://github.com/ExaScience/smurff/blob/56487ec54a3e5b6db39403ed95caa4421a7e94b9/lib/smurff-cpp/SmurffCpp/Priors/MacauPrior.cpp#L142

We need to save mu, the sample from the HyperPrior distribution, and use that instead of mu + Uhat.

tvandera commented 5 years ago

Original macau code is correct:

https://github.com/jaak-s/macau/blob/8300eb5869ef520f51ae37116a3c7389eef7aab3/lib/macau-cpp/latentprior.cpp#L308

tvandera commented 5 years ago

Fixed since b45778ffca0f375ca0dd38d54897f2b5f25e64f4