Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar. #16
Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar.
c_in: The dimension of the content/speaker embedding
c_h: We set it to 128.
c_out: The number of speakers, which is 80 in this paper.
import torch.nn as nn
from einops import rearrange
class Classifier(nn.Module):
def __init__(self, c_in, c_h, c_out):
super(Classifier, self).__init__()
self.in_layer = nn.Linear(c_in, c_h)
self.conv_relu_block = nn.Sequential(
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
)
self.out_layer = nn.Linear(c_h, c_out)
def forward(self, x):
"""
x: (n, c, t)
"""
x = rearrange(x, 'n c t -> n t c')
y = self.in_layer(x)
y = rearrange(y, 'n t c -> n c t')
y = self.conv_relu_block(y)
y = y.mean(-1)
y = self.out_layer(y)
return y
Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar.
c_in
: The dimension of the content/speaker embeddingc_h
: We set it to 128.c_out
: The number of speakers, which is 80 in this paper.Originally posted by @KimythAnly in https://github.com/KimythAnly/AGAIN-VC/issues/10#issuecomment-854148318