KimythAnly / AGAIN-VC

This is the official implementation of the paper AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization.
https://kimythanly.github.io/AGAIN-VC-demo/index
MIT License
111 stars 19 forks source link

Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar. #16

Open dslllu opened 2 years ago

dslllu commented 2 years ago

Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar. c_in: The dimension of the content/speaker embedding c_h: We set it to 128. c_out: The number of speakers, which is 80 in this paper.

import torch.nn as nn
from einops import rearrange

class Classifier(nn.Module):
    def __init__(self, c_in, c_h, c_out):
        super(Classifier, self).__init__()
        self.in_layer = nn.Linear(c_in, c_h)
        self.conv_relu_block = nn.Sequential(
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
            nn.Conv1d(c_h, c_h, 3),
            nn.ReLU(),
        )
        self.out_layer = nn.Linear(c_h, c_out)

    def forward(self, x):
        """
        x: (n, c, t)
        """
        x = rearrange(x, 'n c t -> n t c')
        y = self.in_layer(x)
        y = rearrange(y, 'n t c -> n c t')
        y = self.conv_relu_block(y)
        y = y.mean(-1)
        y = self.out_layer(y)
        return y

Originally posted by @KimythAnly in https://github.com/KimythAnly/AGAIN-VC/issues/10#issuecomment-854148318

dslllu commented 2 years ago

Hi,Can you provide more details on training the speaker classifier? such as how to set labels