gamboviol / bpr

Bayesian Personalized Ranking
212 stars 87 forks source link

Does input needs to be deduplicated? #14

Open tinahbu opened 5 years ago

tinahbu commented 5 years ago

Hi Mark thank you for the repository!

I was wondering if a user interacted with an item more than once, do you only keep 1 record of it in the training data? How does BPR distinguish between an item that a user frequently consumed and an item that a user only consumed once or twice? It seems if the input is not deduplicated the more popular item has a higher chance to be chosen at each step thus gives it more importance (which makes sense because the more frequently an item is consumed the more confident we are that the user likes it with no explicit feedback)? But the the input matrix is not a "all-one" matrix anymore and each entry would be the number of times user u consumed item I instead.

Can you help me in understanding this a little bit more? Thank you