question on interpretation of negSearchLimit / maxNegSamples

jwijffels commented 5 years ago

hello, I'm writing some documentation for the R wrapper in a vignette before I'll try to upload it to CRAN in January. I'd like to write something about the arguments negSearchLimit and maxNegSamples The docs indicate the following:

negSearchLimit number of negatives sampled [50]
maxNegSamples max number of negatives in a batch update [10]

knipsel

If I look to the paper do I understand this correctly that if I'm in a multi-class classification settting and I have a bunch of texts (where each can have several labels) that for each text the positive entities are the ones which were labelled and the default is to sample for the negatives from the remaining labels 50 of them and from these 50 only keep 10? Or is there another interpretation of this L^batch (what is in this batch is it one text or more than 1 text) and this maxNegSamples? Does maxNegSamples correspond to k in the screenshot of the paper or is negSearchLimit k in the screenshot of the paper?

ledw commented 5 years ago

Hi @jwijffels, thanks for working on the R wrapper. For your question, it is an implementation detail that we did not include in the paper. The negSearchLimit means the number of negatives we sample during each batch, for the sampled candidates, some of them are 'real' negatives which makes the loss greater than 0. The maxNegSamples is a limit for 'real' negatives: we update at most maxNegSamples 'real' negatives each batch. In the screen shot, k correspond to negSearchLimit.

jwijffels commented 5 years ago

Thanks for the clarification that negSearchLimitis k in the paper To be sure on my interpretation of the answer on negSearchLimit can you

Clarify what is in a batch in STARSPACE-2017-2 for e.g. multi-label classification - does that batch only contain positives and negatives of 1 text or does it contain positives and negatives of several texts?
Can you give an example on the 'real' in the 'real' negatives

thisisandreeeee commented 5 years ago

@ledw Related to the interpretation of negSearchLimit and maxNegSamples, with regards to tuning these parameters.

Could you provide some intuition on how one might go about tuning these parameters, and the expected effects from tuning?

Based on my understanding of negSearchLimit, you're basically considering a larger number of negative samples when deciding how to optimise within a batch. Too few negative samples, and it will be difficult for starspace to differentiate between the positive and negative samples. Too many negative samples, and there's too much noise. Is this understanding accurate?

I don't really understand what maxNegSamples does though, nor do I understand what makes a negative "real".

jwijffels commented 5 years ago

Ok I did some inspection of the code myself

To go short. negSearchLimit is k from the paper indicating the number of positives/negatives in the batch update and maxNegSamplesis an upper bound to that in order not include too many negative comparisons
Relevant code is here https://github.com/facebookresearch/StarSpace/blob/a5c7ea6a3cd1646bfe57c629a552da318b854d92/src/model.cpp#L433 and the following 10 lines as well as code at https://github.com/facebookresearch/StarSpace/blob/a5c7ea6a3cd1646bfe57c629a552da318b854d92/src/model.cpp#L382
If you see to that code and if we disregard the margin parameter, loss of one comparison is calculated as (similarity with the negative - similarity with the positive). If there are more than maxNegSamples of these comparisons which are negative (entities are not similar whatsoever, indicating a 'real' negative), further comparisons are not done (and as such basically maxNegSamples limits the k parameter from the paper)

tolliam commented 1 year ago

Just following up on this, could I ask whether the negSearchLimit parameter is negative samples per document or for all documents being trained on. Ie will there be 50 negative docs with wrong labels or 50xtraining set size negative docs in total? I'm using ruimtehol and don't understand the above and references to "batches" in this thread. Many thanks

jwijffels commented 1 year ago

Just following up on this, could I ask whether the negSearchLimit parameter is negative samples per document or for all documents being trained on. Ie will there be 50 negative docs with wrong labels or 50xtraining set size negative docs in total? I'm using ruimtehol and don't understand the above and references to "batches" in this thread. Many thanks

It's not per document or for all documents, it is per mini-batch in Starspace STARSPACE-2018-2. A mini-batch is a sample of text which can be words / documents / labels depending on the training mode for which the negatives will always be the same. In STARSPACE-2017-2 (which R wrapper ruimtehol is using), the concept of mini-batch is not implemented and hence it is a sample for of text which can be words / documents / labels depending on the training mode.

tolliam commented 1 year ago

Ok many thanks. I'm using embed_tagspace so to check I'm understanding correctly then the corpus of documents and labels gets split in to samples of batchSize? And then negSearchLimit negatives will be taken but maxNegSamples in practice will cap that?

jwijffels commented 1 year ago

Ok many thanks. I'm using embed_tagspace so to check I'm understanding correctly then the corpus of documents and labels gets split in to samples of batchSize? And then negSearchLimit negatives will be taken but maxNegSamples in practice will cap that?

If this is a question related to ruimtehol, please put the question there.

facebookresearch / StarSpace

question on interpretation of negSearchLimit / maxNegSamples #204