CODEJIN / PWGAN_for_HiFiSinger

MIT License
11 stars 3 forks source link

what hp.Train.Wav_Length=48000 means? (some questions on frame-shift (sample rate) #3

Open hualizhou167 opened 2 years ago

hualizhou167 commented 2 years ago

Hello, thanks for your work and open-source code. I‘ve reproduced the hifisinger based on an open-source Chinese dataset, your hifisinger and PWG, but encountered the following problems. I hope to get your help. The dataset I used was sampled at 44100Hz, and the experimental results obtained were not very satisfactory while keeping the hyperparameters of your two projects unchanged.

  1. Therefore, I made the following modifications to the sample rate, frame length and frame shift in these two repositories according to the sample rate of the dataset: Sound.Sample_Rate=44100; Sound.Mel_Dim=80; Sound.Spectrogram_Dim=1025. Is this modification correct and necessary? Will this affect the performance of the network model?

  2. After I modified the hyperparameters of the hifisinger project according to 1., the PWG obtained from the previous 48kHz, 960 frame-length, 240 frame-shift (hparams are not been modified) training could not be used. The error is as follows: image Does this mean that the network structure of PWG is related to these few hyperparameters (sample_rate, frame_length, frame_shift and wav_length)? How should I modify the code of PWG?

  3. While debugging the bug mentioned in 2., I noticed a hyperparameter called hp.Train.Wav_Length=48000 in the PWG project. I wonder what this means. Should it be equal to the sample rate?

Looking forward to your reply, thank you very much~