Closed KevinWang676 closed 1 year ago
Hi! Could you create a minimum example of the code you were running? Were you using wav_to_mel
or wavs_to_mel
?
When using wavs_to_mel
, make sure you're using 22050 sampling rate audio with shape [batch_size, samples]
.
Edit:
In case you're using wav_to_mel
with audio loaded using torchaudio, you have to get rid of the 0th dimension like so:
mel_spectrogram = synthesiser.wav_to_mel(
some_audio[0], some_sample_rate
)
Hi, thanks for your reply. My code is
from simple_hifigan import Synthesiser
synthesiser = Synthesiser()
# create a mel spectrogram
mel_spectrogram = synthesiser.wav_to_mel(
"audio.wav", 24000
)
audio = synthesiser(mel_spectrogram)
I wonder if it is a correct use. The sample rate of my audio is 24000
. Thanks!
That should work. I will check, maybe it's a problem in the library. This hasn't been officially released yet, sorry about any bugs like this, and thanks for reporting!
Thanks for checking it. I didn't load my .wav
file using torchaudio. The "audio.wav" in my code is just a path. Can that be a problem?
The path alone should work fine. I only tested the method with torchaudio but not with a path, but it has now been fixed in 7d27c8396c8832cb53ecb35b3250a72e4cf3055f
I also added your example to https://github.com/MiniXC/simple_hifigan/tree/main/tests In the future I should add some proper tests, but for now it's good to have at least one working example there.
Thanks for fixing it. You mentioned 22050
sampling rate would work. Would other sampling rates like 24000
also work?
Actually any sampling rate should work, but if it's too far away from 22050
there could be issues. When loading from a file, you actually don't need to specify the sampling rate since that information is in the file anyways and wav_to_mel
accounts for this.
Also note that the synthesised audio will always have a sampling rate of 22050
- this is just how hifigan was trained, so if you need the audio in any other sampling rate you'd have to resample afterwards.
Got it, thank you so much! I have another question about Hifigan. I wonder what's the use of Hifigan. Is it to denoise the audio or to make the audio quality better?
HifiGAN is a vocoder, it converts Mel Spectrograms to Audio Waveforms. You can find more information on vocoders here and on hifigan here
Thanks, I'll go through the materials.
Hi, I got the error called
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 926, 80, 2]
. I wonder how I can fix it. Thanks.