Closed Mu-Y closed 1 year ago
Yes. Thank you for spotting this typo!
Thanks for the fast response! Another question: for the speaker embedding input to the predictor network, is it computed from the original utterance, or from the converted voice from the voice converter (in teacher)?
It is computed from the original utterances.
Hello,
For the contrastive loss (Eq 2 in the paper), should there be a negative sign in the beginning? Conceptually, R(1) and R(2) should be similar, however, minimizing the loss seems to be minimizing similarity between them?![Screen Shot 2023-02-10 at 12 29 59 PM](https://user-images.githubusercontent.com/38511642/218169583-e3ec676a-b607-4a78-961c-80d93901c287.png)
The code of the contrastive loss (https://github.com/auspicious3000/contentvec/blob/main/contentvec/criterions/contentvec_criterion.py#L108) calculates cross entropy, which is a combination of LogSoftmax and Negative Log Likelihood, also seems to actually take the negative of the contrastive loss? Not sure whether I understand this correctly.
Thanks!