NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.41k stars 1.02k forks source link

Potential numerical instability in pearson sim #216

Open NicolasHug opened 6 years ago

NicolasHug commented 6 years ago

Following #214, there might be some numerical instability in pearson similarity computation, probably caused by the sqrt function receiving either negative values, NaN or infinity.

I couldn't reproduce the issue myself.

NicolasHug commented 6 years ago

Also from #224, there seem to be some pretty big differences between the current computation and the one with the classical method

NicolasHug commented 5 years ago

Also from #224, there seem to be some pretty big differences between the current computation and the one with the classical method

Update: actually no, all is fine on this side. I had made an error in pearson_std_formula: the means should have been computed on the common items only: that's what we're doing in surprise. There's actually a comment about whether or not this should be done in Aggarwal textbook: the one in surprise is the strictly correct one, even though it's not clear whether it makes a big difference.