auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
983 stars 207 forks source link

Hyperparameters for generating mel spectrogram from training .wav files #25

Open sroutray opened 4 years ago

sroutray commented 4 years ago

Could you please tell us how you generated mel spectrograms for training from .wav files? What were the parameters used?

auspicious3000 commented 4 years ago

4

sroutray commented 4 years ago
y, sr = librosa.load('p225_001.wav', sr=16000)
S = librosa.feature.melspectrogram(y, sr=16000, n_mels=80, fmin=90, fmax=7600, n_fft=1024, hop_length=256)
S_r0 = 20 * np.log10(np.maximum(1e-5, S))
S_r0 = S_r0 - 16
S_r0 = np.clip((S_r0 + 100.0) / 100.0, 0, 1)
print(np.min(S_r0),np.max(S_r0), S_r0.shape)

waveform = wavegen(model, c=S_r0.T)   
librosa.output.write_wav('test_r0.wav', waveform, sr=16000)

I am using the above code to generate mel spectrogram of the file p225_001.wav. Here, I have used the following parameters: num_mels: 80 fmin: 90 fmax: 7600 fft_size: 1024 hop_size: 256 min_level_db: -100 ref_level_db: 16 But the generated spectrogram is not same as the one provided in metadata.pkl. Also I tried passing both the spectrograms through the wavenet vocoder model provided but the audio generated for my spectrogram is inferior in quality as compared to the audio generated by using the spectrogram in metadata.pkl

auspicious3000 commented 4 years ago

4 see the last few comments