Closed Liujingxiu23 closed 2 years ago
Hi, thanks for your interests. For f
, it has the shape of (batch-size, 1+num_negative, seq-len), the first element of dim-1 corresponds to positive sample, while the rest elements of dim-1 correspond to negative samples, so the label of first element (positive) is 0. If you look at the equation of CPC-loss, each term is very similar to the probability of positive sample except that the denominator doesn't contain the value for positive sample, minimizing CPC loss will increase the probability of positive sample (has label - 0), which is similar to minimizing the cross-entropy loss here.
@Wendison Thank you for your explaination. N samples are selected, the first one(index=0) is positive, the left N-1 samples are negatives. labels is set to zeros just tell the function of F.cross_entropy that the the samples at index=zero are the positive class which should put in the numerator of loss computation。 Is this right?
@Liujingxiu23 Yes, you're right.
@Wendison Thank you again. The discription in the paper is little complicated, the code are more clear.
I read related papers, but still do not understand the CPC loss computaiton.
Can someone explain it for me. Why labels of zeros and cross_entropy used here?