Open rumourscape opened 3 hours ago
pretrained BigVGAN don't support 16khz audio
pretrained BigVGAN don't support 16khz audio
I trained BigVGAN from scratch for 16kHz as mention in environment details. I have tried using other 16khz vocoders like hifigan with same config and I still get noisy audio.
How many training steps and which dataset?
How many training steps and which dataset?
I used a 14 hrs Indian English dataset (https://huggingface.co/datasets/SPRINGLab/smt_english) and trained for close to 2000 epochs or about 500k steps
Checks
Environment Details
Ubuntu 22.04, CUDA=12.4, pytorch=2.5.0, vocoder = bigvgan trained on 16kHz with same config as given below
Steps to Reproduce
Train a model from scratch using the following configurations:
target_sample_rate = 16000 n_mel_channels = 80 hop_length = 160 win_length = 640 n_fft = 1024
✔️ Expected Behavior
Should give a clean speech audio as produced by 24kHz model
❌ Actual Behavior
Model produces very noisy speech, although it has learned to say words well. Sample output from my 16kHz model: https://asr.iitm.ac.in/cdac/gpu16/sample.wav