Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar.

Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar. c_in: The dimension of the content/speaker embedding c_h: We set it to 128. c_out: The number of speakers, which is 80 in this paper.

import torch.nn as nn
from einops import rearrange

class Classifier(nn.Module):
    def __init__(self, c_in, c_h, c_out):
        super(Classifier, self).__init__()
        self.in_layer = nn.Linear(c_in, c_h)
        self.conv_relu_block = nn.Sequential(
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
        )
        self.out_layer = nn.Linear(c_h, c_out)

    def forward(self, x):
        """
        x: (n, c, t)
        """
        x = rearrange(x, 'n c t -> n t c')
        y = self.in_layer(x)
        y = rearrange(y, 'n t c -> n c t')
        y = self.conv_relu_block(y)
        y = y.mean(-1)
        y = self.out_layer(y)
        return y

Originally posted by @KimythAnly in https://github.com/KimythAnly/AGAIN-VC/issues/10#issuecomment-854148318

KimythAnly / AGAIN-VC

Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar. #16