Open ghost opened 5 years ago
In the calculation of l
:
Here above it seems to have set self.sampler
to an instance of NegativeSampling
.
Here above in the for
loop, loss
was assigned the return of calling NegativeSampling
each time. So does it discard the previous value each time while the formula in paper says it should be added together? Or some nature of the self.sampler
made them accumulated?
I’m trying to adapt this example to my own tweets data, but I got really confused about the loss function calculation. The calculation in the code is not identical to the formula in the paper. The above two pictures show the Loss function formula in paper, but in the code, it only contains the first part of the sum,
L^d
. There is a dubious calculation ofl
underlined in the following picture, which I think might be the second part, but it’s not added to the loss before loss.backward(). And I had to add a multiplication of-clambda
to the originalprior
code (right under the red line in the above picture), because the returned value of model.prior() is actually a minus value of the formula (5) without the lambda factor. I don’t know Chainer, so it’s very hard to decide whetherl
was calculated correctly as the second part of the sum and could be readily added to theloss
beforebackward()
. It's also hard to decide the intention of thefraction
variable and whether it should be multiplied to thel
part of the sum too. Actually, I tried addingl
toloss
without multiplying it tofraction
but didn’t get the expected result.