Closed Robinysh closed 4 years ago
Check the range of your audio file using this:
from scipy.io.wavfile import read
sampling_rate, data = read(audio_path)
print(data.min(), data.max())
My guess is that your min and max are within [-1, 1].
Closing due to inactivity.
I tried to use the follow script to generate my melspectrograms. https://github.com/NVIDIA/tacotron2/blob/131c1465b48be60cb5d3b8ab79cfc663e5c47b6a/data_utils.py#L37-L54 However the scale of them doesn't seem right, they are way too small. While the volume of my audio is clearly audible, most of my melspectrogram is clipped away by the default value of https://github.com/NVIDIA/tacotron2/blob/131c1465b48be60cb5d3b8ab79cfc663e5c47b6a/audio_processing.py#L78-L84 So I am wondering are these really the code used to generate the melspectrograms? I am planning to fine tune the pretrained model with my own dataset so it would be best for the data to have similar preprocessing pipelines and statistics. Thanks.