Closed adonisues closed 3 years ago
On the contrary, I think it's complete in theory because NCE only requires the noise distributed to mimic the real data distribution as close as possible. Removing the target word in noise samples, as well as removing redundant noise samples, could potentially distort the unigram distribution
. But I actually haven't tested the effects of all these special treatments, do you have any experimental results to share?
Hello. Thanks you for your NCE code in pytorch. It is very helpful. I have some question about noise sampling. In your code, target sample can be sampled as noise sample. And "K" noise sample can be overlap. Is it OK ? I think it is not valid in theory, but practically OK. Do you have any idea for this ?