I was looking at examples of speech synthesis via Tacotron2 + WaveGlow and saw an option for Tacotron called --load-mel-from-disk. For WaveGlow, this argument doesn't seem to be used due to the random selection of audio samples from the input audio file which are then used to generate a Mel spectrogram. Is my understanding correct?
If so, is there any reasonable scenario (aside from generating each Mel spectrogram for random clips for all audio files) where the Mel-spectrograms could be generated prior to training that could be loaded from disk rather than regenerating spectrograms on random samples each time?
Hi All,
I was looking at examples of speech synthesis via Tacotron2 + WaveGlow and saw an option for Tacotron called --load-mel-from-disk. For WaveGlow, this argument doesn't seem to be used due to the random selection of audio samples from the input audio file which are then used to generate a Mel spectrogram. Is my understanding correct?
If so, is there any reasonable scenario (aside from generating each Mel spectrogram for random clips for all audio files) where the Mel-spectrograms could be generated prior to training that could be loaded from disk rather than regenerating spectrograms on random samples each time?
Thank you!