nSpeaker affects M? - Githubissues

I'm a bit confused about how data in a minibatch is orgnanized.

When I call the training script like this:

python trainSpeakerNet.py ... --trainfunc angleproto --batch_size 64 ... --nSpeakers 3 ...

and examine the input x of AngleProtoLoss.forward, its shape is [64,3,512]. If I change nSpeakers to 2, it's [64,2,512].

Isn't the second dimension number of utterances per speaker? I don't understand why it's equal to nSpeakers. Unless I'm mistaken about prototypical loss, this isn't correct because we need to compute the centroid by taking the mean across different utterances for the same speaker, but we're instead computing the mean across different speakers.

clovaai / voxceleb_trainer

nSpeaker affects M? #44