I was just familiarizing myself with the code for this library after reading the BPR Paper and am concerned I found a bug. According to Line 5 in Figure 4 of the paper, the model parameters should receive gradient updates according to
However, in this line, z is computed as z = 1.0 / (1.0 + exp(score)), which is the sigmoid function without taking the derivative (and possibly missing a negative as well?). I was able to compare results with my own implementation of BPR on a problem (cannot share until made public in a few months), but it seemed on my dataset my implementation performed better with the proper gradient updates. Would appreciate any feedback if there is some nuance I am missing!
I was just familiarizing myself with the code for this library after reading the BPR Paper and am concerned I found a bug. According to Line 5 in Figure 4 of the paper, the model parameters should receive gradient updates according to
$$\Theta \leftarrow \Theta + \alpha ( \frac{e^{-\hat{x{uij}}}}{1+e^{-\hat{x{uij}}}} \cdot \hat{x{uij}} + \lambda{\Theta} \cdot \Theta )$$
However, in this line, z is computed as
z = 1.0 / (1.0 + exp(score))
, which is the sigmoid function without taking the derivative (and possibly missing a negative as well?). I was able to compare results with my own implementation of BPR on a problem (cannot share until made public in a few months), but it seemed on my dataset my implementation performed better with the proper gradient updates. Would appreciate any feedback if there is some nuance I am missing!