benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.55k stars 611 forks source link

Wrong sampling procedure in BPR? #255

Closed artli closed 5 years ago

artli commented 5 years ago

It seems to me that the (u, i, j) triplets are not being sampled uniformly. This line selects a disliked item j by uniformly drawing a non-zero entry of the user-item matrix and getting its column. This should lead to popular items being oversampled as negative examples. It may actually turn out to be beneficial empirically but is not in accordance with the paper.

(u, i) are being drawn uniformly though, so I guess the issue can be fixed by selecting j uniformly from the range (0, items - 1). With verify_neg enabled, this should result in true uniform sampling, as far as I understand.

ita9naiwa commented 5 years ago

You're right. BPR implementation in implicit uses weighted sampling.

Two benefits are given by this implementation

I can't find references, more weights to popular items when sampling gives better performances.

ita9naiwa commented 5 years ago

There are other slight differences between the original paper descriptions and implementations.

artli commented 5 years ago

If it's empirically verified that this sampling scheme is beneficial then great! Perhaps it could be mentioned in the docs or at least in a comment, just in case?

benfred commented 5 years ago

This does work quite a bit better than uniformly sampling the negative items - and is almost certainly the correct thing to do. The problem with uniform sampling the negativee items is that we aren't uniformly sampling the positive items : popular items will come up more often than unpopular items (since we are drawing from interactions). The sampling we are doing means that the likelihood of an item being chosen for a negative sample is the same as a positive one.

artli commented 5 years ago

Well, by the same token, if you sample as specified in the paper, the really unpopular items would be chosen as negatives somewhat more often. But yeah, oversampling the popular items does make sense as they should in some sense be more important to the model. From what I've seen it indeed leads to better results.