Closed artli closed 5 years ago
You're right. BPR implementation in implicit uses weighted sampling.
Two benefits are given by this implementation
I can't find references, more weights to popular items when sampling gives better performances.
There are other slight differences between the original paper descriptions and implementations.
If it's empirically verified that this sampling scheme is beneficial then great! Perhaps it could be mentioned in the docs or at least in a comment, just in case?
This does work quite a bit better than uniformly sampling the negative items - and is almost certainly the correct thing to do. The problem with uniform sampling the negativee items is that we aren't uniformly sampling the positive items : popular items will come up more often than unpopular items (since we are drawing from interactions). The sampling we are doing means that the likelihood of an item being chosen for a negative sample is the same as a positive one.
Well, by the same token, if you sample as specified in the paper, the really unpopular items would be chosen as negatives somewhat more often. But yeah, oversampling the popular items does make sense as they should in some sense be more important to the model. From what I've seen it indeed leads to better results.
It seems to me that the (u, i, j) triplets are not being sampled uniformly. This line selects a disliked item j by uniformly drawing a non-zero entry of the user-item matrix and getting its column. This should lead to popular items being oversampled as negative examples. It may actually turn out to be beneficial empirically but is not in accordance with the paper.
(u, i) are being drawn uniformly though, so I guess the issue can be fixed by selecting j uniformly from the range (0, items - 1). With verify_neg enabled, this should result in true uniform sampling, as far as I understand.