Open utunga opened 5 years ago
I'm also interested in this.
Anyone know if training with Ground Truth Aligned Mel specs help reduce hiss?
@utunga I have the same problem, do you have idea yet? my training waves is also 16000, i use : n_fft = 2048, hop_size = 200, win_size = 800, sample_rate = 16000, -frame_shift_ms = 12.5, magnitude_power = 1.
should magnitude_power be 1 or 2?
We trained Tacotron-2 for a new language with a new alphabet. After giving it the full period of training we are very pleased with the result. (A little over a week on a machine with a single Nvidia v100 GPU and 4.5 hrs of audio data). It's amazing really. Thank you for this work.
But there is still quite a lot of 'wheeziness' or something like a sibilant hiss in the generated audio. I just wondered if this was a well understood problem and what kind of things one might need to do to reduce hiss.
We used recordings with sample_rate = 16000, instead of the default of 22050 .. is that likely to be the cause of the problem?
The recordings themselves are quite well recorded and have good levels (without noise in the silence) that said, we may well go back and do studio quality recordings to get a better result. Is that likely to reduce hiss?
Any other ideas? Thanks in advance!
EDIT: To be clear I'm not talking about hiss between words but a hissingness that happens during words.