Closed danieldk closed 2 years ago
I'll try to take a look tomorrow!
We're keeping the zipf generator around, just not available in config?
I was thinking of removing it in a separate PR. We do have another unused generator, but the zipf generator adds dependencies, so perhaps we should just remove it?
We used the Zipf distribution for negative sampling. However, using the empirical distribution gives better results in practice. This also brings the implementation closer to how word2vec and fastText sample negatives.
I have never found the approach of sampling from an item table very elegant (takes memory and is a source of cache misses). We had a more elegant approach in zWeightedRangeGenerator`, however it turned out to be slow in practice due to its use of binary search.