Hello, thanks for your work and open-source code. I‘ve reproduced the hifisinger based on an open-source Chinese dataset, your hifisinger and PWG, but encountered the following problems. I hope to get your help. The dataset I used was sampled at 44100Hz, and the experimental results obtained were not very satisfactory while keeping the hyperparameters of your two projects unchanged.
Therefore, I made the following modifications to the sample rate, frame length and frame shift in these two repositories according to the sample rate of the dataset: Sound.Sample_Rate=44100; Sound.Mel_Dim=80; Sound.Spectrogram_Dim=1025. Is this modification correct and necessary? Will this affect the performance of the network model?
After I modified the hyperparameters of the hifisinger project according to 1., the PWG obtained from the previous 48kHz, 960 frame-length, 240 frame-shift (hparams are not been modified) training could not be used. The error is as follows:
Does this mean that the network structure of PWG is related to these few hyperparameters (sample_rate, frame_length, frame_shift and wav_length)? How should I modify the code of PWG?
While debugging the bug mentioned in 2., I noticed a hyperparameter called hp.Train.Wav_Length=48000 in the PWG project. I wonder what this means. Should it be equal to the sample rate?
Looking forward to your reply, thank you very much~
Hello, thanks for your work and open-source code. I‘ve reproduced the hifisinger based on an open-source Chinese dataset, your hifisinger and PWG, but encountered the following problems. I hope to get your help. The dataset I used was sampled at 44100Hz, and the experimental results obtained were not very satisfactory while keeping the hyperparameters of your two projects unchanged.
Therefore, I made the following modifications to the sample rate, frame length and frame shift in these two repositories according to the sample rate of the dataset: Sound.Sample_Rate=44100; Sound.Mel_Dim=80; Sound.Spectrogram_Dim=1025. Is this modification correct and necessary? Will this affect the performance of the network model?
After I modified the hyperparameters of the hifisinger project according to 1., the PWG obtained from the previous 48kHz, 960 frame-length, 240 frame-shift (hparams are not been modified) training could not be used. The error is as follows: Does this mean that the network structure of PWG is related to these few hyperparameters (sample_rate, frame_length, frame_shift and wav_length)? How should I modify the code of PWG?
While debugging the bug mentioned in 2., I noticed a hyperparameter called hp.Train.Wav_Length=48000 in the PWG project. I wonder what this means. Should it be equal to the sample rate?
Looking forward to your reply, thank you very much~