NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.34k stars 1.01k forks source link

Full-length vector norm option for cosine similarity calculation #385

Open mmlynarik opened 3 years ago

mmlynarik commented 3 years ago

Hi,

I am aware that this library is focused on the explicit ratings, but I've come across the use case when it could be hugely useful to also have implemented one option related to implicit rating calculations. This paper e.g. mentions on the bottom of page 6 the formula for item-item similarities using baseline neighborhood model, being calculated using full-length vector norm rather than only the intersection of vectors as is the case in this library:

image

As your library is optimized by C language and has great performance for calculating item similarities for large datasets, I was wondering if you please could add to your options the possibility to calculate the cosine similarity also with full vector norms, i.e. in the denominator it would be u \in Ui and u \in Uj rather than u \in Uij and analogously for users-based similarity. That would be nice extension covering both options - calculating similarity only on the intersection of vectors and on full vectors as well.

Thanks!