Andras7 / word2vec-pytorch

Extremely simple and fast word2vec implementation with Negative Sampling + Sub-sampling
178 stars 55 forks source link

SubSampling formula #11

Open francesco-mollica opened 2 years ago

francesco-mollica commented 2 years ago

Why add (t/f) in this formula for discards:

t = 0.0001
f = np.array(list(self.word_frequency.values())) / self.token_count
self.discards = np.sqrt(t / f) + (t / f)