A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding. Audio samples can be found here. Colab demo can be found here. Accompanying Tacotron implementation can be found here
Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
pip install univoc
import torch
import soundfile as sf
from univoc import Vocoder
# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()
# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)
mel = ...
# generate waveform
with torch.no_grad():
wav, sr = vocoder.generate(mel)
# save output
sf.write("path/to/save.wav", wav, sr)
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
pip install -r requirements.txt
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
Pretrained weights for the 10-bit LJ-Speech model are available here.