Wendison / VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
MIT License
334 stars 55 forks source link

The CPCLoss #7

Closed Liujingxiu23 closed 2 years ago

Liujingxiu23 commented 3 years ago

I read related papers, but still do not understand the CPC loss computaiton.

    labels = torch.zeros(
        self.n_speakers_per_batch * self.n_utterances_per_speaker, length,
        dtype=torch.long, device=z.device
    )

    loss = F.cross_entropy(f, labels)

Can someone explain it for me. Why labels of zeros and cross_entropy used here?

Wendison commented 3 years ago

Hi, thanks for your interests. For f, it has the shape of (batch-size, 1+num_negative, seq-len), the first element of dim-1 corresponds to positive sample, while the rest elements of dim-1 correspond to negative samples, so the label of first element (positive) is 0. If you look at the equation of CPC-loss, each term is very similar to the probability of positive sample except that the denominator doesn't contain the value for positive sample, minimizing CPC loss will increase the probability of positive sample (has label - 0), which is similar to minimizing the cross-entropy loss here.

Liujingxiu23 commented 3 years ago

@Wendison Thank you for your explaination. N samples are selected, the first one(index=0) is positive, the left N-1 samples are negatives. labels is set to zeros just tell the function of F.cross_entropy that the the samples at index=zero are the positive class which should put in the numerator of loss computation。 Is this right?

Wendison commented 3 years ago

@Liujingxiu23 Yes, you're right.

Liujingxiu23 commented 3 years ago

@Wendison Thank you again. The discription in the paper is little complicated, the code are more clear.