[Guide, Help] How to install and use on Ubuntu/Debian Linux

There are two repos, this one and a fork with a working/better UI. They both function much the same. So we'll use the fork.

I found tortoise to be unreliable on Windows, including voice training. So I only use Linux for tortoise.

Instructions

1) Clone repo & change directory into it

git clone --depth=1 https://github.com/Acephalia/tortoise-tts-fast-GUI.git && cd tortoise-tts-fast-GUI

2) Install python 3.10 (if it doesn't exist, find a repository/PPA)

sudo apt -y install python3.10 python3.10-dev python3.10-venv

3) Setup virtual environment (ensures packages & versions only exist within this project)

python3.10 -m venv venv echo "source venv/bin/activate" > activate

4) Start virtual environment (use this every time you do anything with pip/python packages)

source activate Info: You may leave the virtual environment by writing 'deactivate'

5) Install tortoise dependencies

Torch (CUDA edition) pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 --no-cache-dir Makes Vocoder magically work pip3 install -e . BigVGAN (what tortise-fast uses to generate synthesizes high-fidelity waveforms) pip3 install git+https://github.com/152334H/BigVGAN.git Ensure this package is not installed, we source it locally instead pip3 uninstall tortoise

6) Download models and such

echo "Write your text here, the instructions are wrong" | ./scripts/tortoise_tts.py --voice emma --seed 42

This is where it appears to get stuck "Downloading the main structure of voicefixer", but in reality it's downloading over 600MB of data at the slowest speeds imaginable. It may take an hour. You can try a download manager like FDM and try the method below to manually download.

If you cancel your download, delete this folder to try again rm -rf ~/.cache/voicefixer

Faster method to obtain the files above using a download manager (FDM) or mirror (WIP)

Source: https://zenodo.org/record/5469951/files/model.ckpt-1490000_trimed.pt?download=1 Mirror: https://drive.google.com/file/d/1MetvWA9NULZPq0KjTdj0DFjQu5fIiwia/view Destination: ~/.cache/voicefixer/synthesis_module/44100/model.ckpt-1490000_trimed.pt Size: 129.3MB

Source: https://zenodo.org/record/5600188/files/vf.ckpt?download=1 Mirror: https://drive.google.com/file/d/1APezpeB6hjZWK3GG7oJZCgs6OKOSIZV-/view Destination: ~/.cache/voicefixer/analysis_module/checkpoints/vf.ckpt Size: 466.6MB

7) To fix this next error of this buggy app

Traceback (most recent call last):
  File "/home/nom/Projects/tortoise-tts-fast-GUI/./scripts/tortoise_tts.py", line 280, in <module>
    tts = TextToSpeech(
  File "/home/nom/Projects/tortoise-tts-fast-GUI/tortoise/api.py", line 271, in __init__
    self.autoregressive.load_state_dict(torch.load(ar_path))
  File "/home/nom/Projects/tortoise-tts-fast-GUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
        Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".

Edit api.py Find: self.autoregressive.load_state_dict(torch.load(ar_path)) Replace: self.autoregressive.load_state_dict(torch.load(ar_path), strict=False)

8) To fix this next error of this buggy app

Rendering emma_00 (1 of 1)...
  Hello
Traceback (most recent call last):
  File "/home/nom/Projects/tortoise-tts-fast-GUI/./scripts/tortoise_tts.py", line 352, in <module>
    gen = tts.tts_with_preset(
AttributeError: 'TextToSpeech' object has no attribute 'tts_with_preset'

WIP

This repo is so broken, I don't think it ever worked.

152334H / tortoise-tts-fast