Add shrinkage option to pearson similarity

NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems

http://surpriselib.com

BSD 3-Clause "New" or "Revised" License

6.31k stars 1k forks source link

Add shrinkage option to pearson similarity #137

Open ODemidenko opened 6 years ago

ODemidenko commented 6 years ago

Shrunk similarity, as presented in pearson_baseline algorithm may be relevant not only for the given similarity measure, but for other approaches as well, as it is absolutely orthogonal to the baseline concept or to the similarity measure. It might be perfectly suitable for kNNWithMeans, with cosine similarity.

I offer to decouple it from pearson-baseline similarity and make it available for any kind of item-item algorithm.

NicolasHug commented 6 years ago

I have no problem with this but please pay attention to the following points:

The shrinkage can be used with any algorithm. It's just only available for the pearson_baseline similarity, but this similarity can be used with any algorithm.
Shrinkage is only relevent when the original similarity can be considered to be drawn from a normal distribution with zero mean. Hence, it should only be implemented on similiarities that are zero-centered. For now only pearson and pearson_baseline are zero-centered.

ODemidenko commented 6 years ago

Shrinkage is only relevent when the original similarity can be considered to be drawn from a normal distribution with zero mean.

I agree. we shouldn't introduce shrinkage for all similarity measure, as it makes sense only for zero-centered measures (and I believe we should provide options that have at least some sense). We need another similarity measure which uses similarity centered around zero - "adjusted cosine", as proposed in #135.