giuliowaitforitdavide / recsyslearn

GNU General Public License v3.0
7 stars 0 forks source link

[bug] Sign in novelty #2

Closed mmosc closed 2 years ago

mmosc commented 2 years ago

Hi! Thanks for the great library :)

I think there is a bug in the formula for novelty: currently, it is top_n = top_n.groupby('user')['rank'].apply(lambda x: - np.log2(1 / x)) this is equivalent to top_n = top_n.groupby('user')['rank'].apply(lambda x: np.log2(x)) with x being a measure of the popularity of the item (i.e., higher x means a more popular item). This means that if x is always high (i.e., if popular items are recommended), novelty will be high, while the opposite should be the case, instead.

Solution: top_n = top_n.groupby('user')['rank'].apply(lambda x: - np.log2(x)) or equivalently top_n = top_n.groupby('user')['rank'].apply(lambda x: np.log2(1 / x))

Cheers! Marta

giuliowaitforitdavide commented 2 years ago

A new branch has been opened to fix the issue. We can contribute directly there and then review the merge request. Thanks for the contribution :smile:

giuliowaitforitdavide commented 2 years ago

@mmosc on https://doi.org/10.1007/s13735-018-0154-2 the Novelty formula seems to be

top_n = top_n.groupby('user')['rank'].apply(lambda x: np.sum(-np.log2(x)))

In brief, we sum the logarithm of the top_n items reciprocal of the popularity score for each user and then compute the mean. Is that correct?

mmosc commented 2 years ago

Hi @giuliowaitforitdavide ! Since [according to the definition you quoted]() we first need to average over the top k elements and then over the users, I think it should be:

top_n = top_n.groupby('user')['rank'].apply(lambda x: np.mean(-np.log2(x)))
return top_n.mean() 

Do you agree?

Cheers! Marta