Closed jwijffels closed 5 years ago
Hi @jwijffels, thanks for working on the R wrapper. For your question, it is an implementation detail that we did not include in the paper. The negSearchLimit
means the number of negatives we sample during each batch, for the sampled candidates, some of them are 'real' negatives which makes the loss greater than 0. The maxNegSamples
is a limit for 'real' negatives: we update at most maxNegSamples 'real' negatives each batch.
In the screen shot, k correspond to negSearchLimit
.
Thanks for the clarification that negSearchLimit
is k
in the paper
To be sure on my interpretation of the answer on negSearchLimit can you
@ledw Related to the interpretation of negSearchLimit
and maxNegSamples
, with regards to tuning these parameters.
Could you provide some intuition on how one might go about tuning these parameters, and the expected effects from tuning?
Based on my understanding of negSearchLimit
, you're basically considering a larger number of negative samples when deciding how to optimise within a batch. Too few negative samples, and it will be difficult for starspace to differentiate between the positive and negative samples. Too many negative samples, and there's too much noise. Is this understanding accurate?
I don't really understand what maxNegSamples
does though, nor do I understand what makes a negative "real".
Ok I did some inspection of the code myself
negSearchLimit
is k
from the paper indicating the number of positives/negatives in the batch update and maxNegSamples
is an upper bound to that in order not include too many negative comparisonsmaxNegSamples
of these comparisons which are negative (entities are not similar whatsoever, indicating a 'real'
negative), further comparisons are not done (and as such basically maxNegSamples
limits the k
parameter from the paper)Just following up on this, could I ask whether the negSearchLimit parameter is negative samples per document or for all documents being trained on. Ie will there be 50 negative docs with wrong labels or 50xtraining set size negative docs in total? I'm using ruimtehol and don't understand the above and references to "batches" in this thread. Many thanks
Just following up on this, could I ask whether the negSearchLimit parameter is negative samples per document or for all documents being trained on. Ie will there be 50 negative docs with wrong labels or 50xtraining set size negative docs in total? I'm using ruimtehol and don't understand the above and references to "batches" in this thread. Many thanks
It's not per document or for all documents, it is per mini-batch in Starspace STARSPACE-2018-2. A mini-batch is a sample of text which can be words / documents / labels depending on the training mode for which the negatives will always be the same. In STARSPACE-2017-2 (which R wrapper ruimtehol is using), the concept of mini-batch is not implemented and hence it is a sample for of text which can be words / documents / labels depending on the training mode.
Ok many thanks. I'm using embed_tagspace so to check I'm understanding correctly then the corpus of documents and labels gets split in to samples of batchSize? And then negSearchLimit negatives will be taken but maxNegSamples in practice will cap that?
Ok many thanks. I'm using embed_tagspace so to check I'm understanding correctly then the corpus of documents and labels gets split in to samples of batchSize? And then negSearchLimit negatives will be taken but maxNegSamples in practice will cap that?
If this is a question related to ruimtehol, please put the question there.
hello, I'm writing some documentation for the R wrapper in a vignette before I'll try to upload it to CRAN in January. I'd like to write something about the arguments negSearchLimit and maxNegSamples The docs indicate the following:
If I look to the paper do I understand this correctly that if I'm in a multi-class classification settting and I have a bunch of texts (where each can have several labels) that for each text the positive entities are the ones which were labelled and the default is to sample for the negatives from the remaining labels 50 of them and from these 50 only keep 10? Or is there another interpretation of this
L^batch
(what is in this batch is it one text or more than 1 text) and thismaxNegSamples
? DoesmaxNegSamples
correspond tok
in the screenshot of the paper or isnegSearchLimit
k
in the screenshot of the paper?