clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

nSpeaker affects M? #44

Closed KurtAhn closed 4 years ago

KurtAhn commented 4 years ago

I'm a bit confused about how data in a minibatch is orgnanized.

When I call the training script like this:

python trainSpeakerNet.py ... --trainfunc angleproto --batch_size 64 ... --nSpeakers 3 ...

and examine the input x of AngleProtoLoss.forward, its shape is [64,3,512]. If I change nSpeakers to 2, it's [64,2,512].

Isn't the second dimension number of utterances per speaker? I don't understand why it's equal to nSpeakers. Unless I'm mistaken about prototypical loss, this isn't correct because we need to compute the centroid by taking the mean across different utterances for the same speaker, but we're instead computing the mean across different speakers.

joonson commented 4 years ago

The second dimension of the embedding vector x is nPerSpeaker (formerly named nSpeakers). I changed the name of the parameter to change confusion.