An unofficial PyTorch implementation of VALL-E, utilizing the EnCodec encoder/decoder.
A demo is available on HuggingFace here.
Besides a working PyTorch environment, the only hard requirement is espeak-ng
for phonemizing text:
espeak
/espeak-ng
.espeak-ng
.
PHONEMIZER_ESPEAK_LIBRARY
environment variable to specify the path to libespeak-ng.dll
.Simply run pip install git+https://git.ecker.tech/mrq/vall-e
or pip install git+https://github.com/e-c-k-e-r/vall-e
.
I've tested this repo under Python versions 3.10.9
, 3.11.3
, and 3.12.3
.
My pre-trained weights can be acquired from here.
A script to setup a proper environment and download the weights can be invoked with ./scripts/setup.sh
. This will automatically create a venv
, and download the ar+nar-llama-8
weights and config file to the right place.
When inferencing, either through the web UI or CLI, if no model is passed, the default model will download automatically instead, and should automatically update.
The provided documentation under ./docs/ should provide thorough coverage over most, if not all, of this project.
Markdown files should correspond directly to their respective file or folder under ./vall_e/
.