Stonesjtu / Pytorch-NCE

The Noise Contrastive Estimation for softmax output written in Pytorch
MIT License
317 stars 45 forks source link

Target Sample can be included in Noise sample #8

Closed adonisues closed 3 years ago

adonisues commented 6 years ago

Hello. Thanks you for your NCE code in pytorch. It is very helpful. I have some question about noise sampling. In your code, target sample can be sampled as noise sample. And "K" noise sample can be overlap. Is it OK ? I think it is not valid in theory, but practically OK. Do you have any idea for this ?

Stonesjtu commented 6 years ago

On the contrary, I think it's complete in theory because NCE only requires the noise distributed to mimic the real data distribution as close as possible. Removing the target word in noise samples, as well as removing redundant noise samples, could potentially distort the unigram distribution. But I actually haven't tested the effects of all these special treatments, do you have any experimental results to share?