Open EderSantana opened 1 year ago
Just to add up some extra details, we should also pass, together with the item_id_negatives all the matching item features expected by the item tower. This would remove the load from a possible cache and instead expect it from the inputs. Let me know if all makes sense.
Yes, it makes sense @EderSantana .
We need to think in a good design to allow the usage of "real" negatives in the retrieval models, as they currently support only in-batch sampling (InBatchSampler
, InBatchSamplerV2
).
In order to use "real" negatives, the number of negatives should be the same for all examples, so that we can have a resulting 2D logits matrix (batch_size, 1(positive)+num_negatives) for the retrieval model. We could use in-batch sampling as a fallback to complete the number of negatives. What do you think? Btw, what is the average # real negatives you expect to have for user positive?
If only item id is used in the item tower of retrieval model, then we could get the item embeddings by just looking up from the embedding table. But if the item tower uses other item features, then we would have to feed each of the "real" negative item features to the item tower to generate the item embeddings, which could not scale well if the number of real negatives is large. In this scenario, keeping a cache the the pre-computed item embeddings from past batches could alleviate the issue a bit. What are your thoughts on these design options?
+1 for this feature. There is a paper on mixed negative sampling(MNS) for Two-Tower neural network. The paper recommends using an index for negative sampling along with in-batch sampling. Link: https://research.google/pubs/pub50257/
🚀 Feature request
Is it possible to sample the negatives for the Two-Tower model from a column provided by the input data? For example, we want to sample negatives from the list of items we displayed to a user in the future. The data schema would look like this
We could get the negatives for that row from
item_id_negatives.
Usually item_id_negatives will be about the same range for all users, but we could consider that column as ragged and slice to the smallest in the batch when building the cost function matrix.Motivation
We're planning to sample negatives from items the user has been exposed to in the future. So random or in batch negative sampling won't get us there. This feature could also be useful for anybody working on problems related implicit negative feedback, dwelling time, etc that requires more control over how negatives are sampled in Two Tower models.