XintaoZhao0805 commented 3 years ago

Greetings, thanks for such a good project. In my experiment, i used the same dataset VCTK as yours, and i had only trained for 68000-steps. The log of my experiment like this:

311600500287_ pic

I noticed that the validation loss is rising and the training loss also has some fluctuating peaks. Is it a normal phenomenon？

thank you in advance :)

auspicious3000 commented 3 years ago

The provided training data is very small for code verification purposes only.

XintaoZhao0805 commented 3 years ago

The provided training data is very small for code verification purposes only.

Indeed. But the image above comes from experiment which using my own VCTK data. There were 20 speakers in my VCTK corpus, 80% of utterances were used in training steps and 10% for validation. I orgnized the data structure in the same form of provided .pkl file. Is there sth wrong happened in my experiment?

auspicious3000 commented 3 years ago

There might be something wrong with your validation data. The validation loss should be around 30.

XintaoZhao0805 commented 3 years ago

There might be something wrong with your validation data. The validation loss should be around 30.

Thanks for your answer. I will check my preprocessing code. By the way, is it a right way for me to generate validation data in the same structure as that in your demo.pkl? Like this:

[Speaker_Name , One-hot , [Mel, normed-F0, length, utterance_name] ]

This is how i did now.

Thanks again for your answer!

auspicious3000 commented 3 years ago

The format is correct.

inconnu11 commented 3 years ago

Did you normalize the Mel spectrogram? What's the range of the Mel spec?

niu0717 commented 3 years ago

Hi, i am grateful if you can tell me how to restructure the demo.pkl. thx :) https://github.com/auspicious3000/SpeechSplit/blob/10ed8b9e25cce6c9a077e27ca175ba696b7df597/solver.py#L16

c1a1o1 commented 3 years ago

Can you make the right demo.pkl file?

auspicious3000 commented 3 years ago

@c1a1o1 Please clearly state your question and create a new issue. Please do NOT flood other issues.

jamesliu commented 3 years ago

Thanks for good paper and project. My experiment is similar to Buckingham's. The validation loss fluctuates around 100 after 30K iterations without further improvement. I haven't figured out what is wrong with my experiment. It would be great in case any suggestions. Thanks.

auspicious3000 commented 3 years ago

@jamesliu This looks like over-fitting to me. Make sure you use a large training set and the validation speakers are in the training set.

jamesliu commented 3 years ago

@auspicious3000 Yes. Thank you for pointing this out. After starting to use full P226 and P231 in VCTK corpus. The training and validation charts look reasonable now, but the reconstruction from the trained model is not good for the demo data. How many iterations can get good results for demo data? Do I need to run more iterations? How about using other optimizer instead of Adam? Thanks.

G training/validation charts

Screen Shot 2020-11-11 at 7 39 07 PM

reconstruction from model

Screen Shot 2020-11-11 at 7 46 40 PM

auspicious3000 commented 3 years ago

@jamesliu Your training set is actually very small, which has only 30 mins of data. Also, the "demo data" needs to be consistent with the training data.

jamesliu commented 3 years ago

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K. Screen Shot 2020-11-14 at 7 38 12 AM

hparams

Default hyperparameters:

hparams = HParams(

synthesis

builder = 'wavenet',

# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'

)

auspicious3000 commented 3 years ago

There are many contributing factors to output quality. It is hard to tell from the information you provided. @jamesliu

FurkanGozukara commented 3 years ago

Could you guys check my problem?

I am trying to achieve very simple train and apply model

@jamesliu @Buckingham0805 @niu0717 @inconnu11 thank you very much for any help

https://github.com/auspicious3000/SpeechSplit/issues/28

tejuafonja commented 3 years ago

Hi James,

I followed your updates and wonder if you continued with the experiments. I plan to run some experiments with another dataset and would like to learn from your experiments thus far.

Did the results become better? Did you increase the training size? Did you increase the training time? Did you change the hyperparameter? Did you try out other techniques?

Thanks and hope to hear from you!

3139725181 commented 2 years ago

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams(

synthesis

builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

Hi, I noticed that the length of most data in VCTK is too long relative to the required input. Have you processed the validation set, like MyCollator in data_loader?

anon-squid commented 2 years ago

I have the same question as @3139725181. How can we use longer audio files in the validation set?

auspicious3000 commented 2 years ago

You can use longer audio. There is no limit on the length of input.

anon-squid commented 2 years ago

During the validation step of training, I get an error from pad_seq_to_2() because len_out=192 is smaller than x.shape[1]

auspicious3000 commented 2 years ago

Right. All these lengths are hyperparameters that can be freely adjusted based your own requirements.

jixinya commented 2 years ago

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams( # synthesis builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

Hi, I use the whole VCTK datset to trian, but the validation loss fluctuates around 70 and then rise. I wonder how you generate the validation data, the same as the training data?

auspicious3000 commented 2 years ago

@jixinya Yes, the validation data is just a separate partition of the training data.

9527950 commented 1 year ago

@auspicious3000 Thanks for your suggestion, I have trained the 80 speakers(P225~P304) in VCTK dataset(due to onehot size is 80) on 2080Ti GPU for 2 days, the result becomes better, but not good enough. Is my training set big enough to avoid overfitting? How many days or iterations should I expect to get a reasonable result? Any suggestion for hyper parameters? Thanks. Please ignore the sudden rise of errors in validation set, I took more examples for validation after 170K.

hparams

Default hyperparameters:

hparams = HParams( # synthesis builder = 'wavenet',
# model   
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,

dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,

dim_freq = 80,
dim_spk_emb = 82,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,

# interp
min_len_seg = 19,
max_len_seg = 32,
min_len_seq = 64,
max_len_seq = 128,
max_len_pad = 192,

# data loader
root_dir = '/data/music/speech_split/assets/spmel',
feat_dir = '/data/music/speech_split/assets/raptf0',
batch_size = 128,
mode = 'train',
shuffle = True,
num_workers = 10,
samplier = 8,
#optimizer = 'RangerLars'
optimizer = 'Adam'
)

I also used all the voices from p225-p246, but my validation set loss function was oscillating upwards. And I use the trained model for conversion and it works poorly, and strangely the content of the sentences has changed, do you know what is causing this?

auspicious3000 / SpeechSplit

The validation loss is rising and fluctuating, is that a regular situation? #11

reconstruction from model

hparams

Default hyperparameters:

synthesis

hparams

Default hyperparameters:

synthesis

hparams

Default hyperparameters:

hparams

Default hyperparameters: