auspicious3000 / contentvec

speech self-supervised representations
MIT License
434 stars 32 forks source link

Question about contrastive loss #5

Closed Mu-Y closed 1 year ago

Mu-Y commented 1 year ago

Hello,

For the contrastive loss (Eq 2 in the paper), should there be a negative sign in the beginning? Conceptually, R(1) and R(2) should be similar, however, minimizing the loss seems to be minimizing similarity between them? Screen Shot 2023-02-10 at 12 29 59 PM

The code of the contrastive loss (https://github.com/auspicious3000/contentvec/blob/main/contentvec/criterions/contentvec_criterion.py#L108) calculates cross entropy, which is a combination of LogSoftmax and Negative Log Likelihood, also seems to actually take the negative of the contrastive loss? Not sure whether I understand this correctly.

Thanks!

auspicious3000 commented 1 year ago

Yes. Thank you for spotting this typo!

Mu-Y commented 1 year ago

Thanks for the fast response! Another question: for the speaker embedding input to the predictor network, is it computed from the original utterance, or from the converted voice from the voice converter (in teacher)?

auspicious3000 commented 1 year ago

It is computed from the original utterances.