Closed qianxixi908 closed 1 year ago
Hi, what dataset did you use?
Thank you for your reply. I used the default dataset ,vctk.
Did you correctly preprocess the data? The folder structure is like
data/features/vctk/mel
├── p225_001.wav.npy
├── p225_002.wav.npy
├── p225_003.wav.npy
...
Thanks for your answer.
Yes, I have handled it as you said. I even used your original sample directly to see if the code can run, but the same error is still reported. And when debugging, the following problems were found:
“config. assert isinstance('dataset'(2083686355952), )object 'dataset' (2083686355952).'feat_path' (2083686356976)”SyntaxError: invalid syntax
Hi, how about the file data/indexes/vctk/indexes.pkl
?
Its type is dict and contains two keys train
and dev
while the values are like p311_198.wav.npy
.
Hello, thank you for your previous answer. Can you provide some details or codes about the speaker classifier mentioned in the experimental part of the article?
Hi, here is the speaker classifier. It might be slightly different from the one we used in this ariticle due to some code refactoring, but the result should be similar.
c_in
: The dimension of the content/speaker embedding
c_h
: We set it to 128.
c_out
: The number of speakers, which is 80 in this paper.
import torch.nn as nn
from einops import rearrange
class Classifier(nn.Module):
def __init__(self, c_in, c_h, c_out):
super(Classifier, self).__init__()
self.in_layer = nn.Linear(c_in, c_h)
self.conv_relu_block = nn.Sequential(
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
nn.Conv1d(c_h, c_h, 3),
nn.ReLU(),
)
self.out_layer = nn.Linear(c_h, c_out)
def forward(self, x):
"""
x: (n, c, t)
"""
x = rearrange(x, 'n c t -> n t c')
y = self.in_layer(x)
y = rearrange(y, 'n t c -> n c t')
y = self.conv_relu_block(y)
y = y.mean(-1)
y = self.out_layer(y)
return y
Hello, I tried to run the train but it showed the following :
raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0
could you give me some advice on how to solve it ? thanks.