Distance-weighted random sampling of non-neighbor negatives

sheffe commented 4 years ago

Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.

I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:

Draw a random set of candidate negatives
Drop any candidate negatives already in the neighbor list
Choose from the remaining set of candidates with probabilities proportional to 1/f(dist(i, j)), where f(dist) could be just 1/dist, 1/sqrt(dist), etc

Annoy gives us the dist(i, j) without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.

This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially) f(dist).

Szubie commented 4 years ago

Hi, thanks for the feedback and the link to the paper, it was a good read!

I like the idea and think it would be interesting to try out and see how it impacts the results of the embeddings. The idea of weighting the sampling of triplets did occur to me back when I was initially writing ivis, but I was thinking about weighting the selection of positive examples rather than negatives at that time, as it would be almost free. Weighting the selection of neighbors never made it past initial prototyping though, since it didn't have a positive impact.

From that paper you linked it does sound like weighting the sampling of negative examples may well the worth trying though. Hopefully the performance impact is negligible, since Annoy is pretty efficient, but will keep an eye on it. It would be nice to keep the number of hyperparameters low, but if there are clear benefits from weighted sampling it shouldn't be a problem to expand the number slightly.

idroz commented 4 years ago

I think this would be worth trying, provided we can benchmark this feature against current performance. We have some benchmarking code that was used to assess L1/L2 distances between points in embedding and original spaces, e.g. https://bering-ivis.readthedocs.io/en/latest/comparisons.html#quantitative-evaluation

I can make that notebook available on ivis repo and we could use it with a new feature branch.

beringresearch / ivis

Distance-weighted random sampling of non-neighbor negatives #67