KoljaB / RealtimeTTS

Converts text to speech in realtime
1.39k stars 119 forks source link
python realtime speech-synthesis text-to-speech

RealtimeTTS

Easy to use, low-latency text-to-speech library for realtime applications

About the Project

RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. It stands out in its ability to convert text streams fast into high-quality auditory output with minimal latency.

Hint: Check out Linguflex, the original project from which RealtimeTTS is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.

Note: If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see here). Should not occur with latest version, but if it does, please downgrade to an older transformers version: pip install transformers==4.38.2.

https://github.com/KoljaB/RealtimeTTS/assets/7604638/87dcd9a5-3a4e-4f57-be45-837fc63237e7

Key Features

Hint: check out RealtimeSTT, the input counterpart of this library, for speech-to-text capabilities. Together, they form a powerful realtime audio wrapper around large language models.

FAQ

Check the FAQ page for answers to a lot of questions around the usage of RealtimeTTS.

Updates

Latest Version: v0.4.1

See release history.

Tech Stack

This library uses:

By using "industry standard" components RealtimeTTS offers a reliable, high-end technological foundation for developing advanced voice solutions.

Installation

Simple installation:

pip install RealtimeTTS

This will install all the necessary dependencies, including a CPU support only version of PyTorch (needed for Coqui engine)

Installation into virtual environment with GPU support:

python -m venv env_realtimetts
env_realtimetts\Scripts\activate.bat
python.exe -m pip install --upgrade pip
pip install RealtimeTTS
pip install torch==2.3.0+cu118 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118

More information about CUDA installation.

Engine Requirements

Different engines supported by RealtimeTTS have unique requirements. Ensure you fulfill these requirements based on the engine you choose.

SystemEngine

The SystemEngine works out of the box using your system's built-in TTS capabilities. No additional setup is needed.

GTTSEngine

The GTTSEngine works out of the box using Google Translate's text-to-speech API. No additional setup is needed.

OpenAIEingine

To use the OpenAIEngine:

AzureEngine

To use the AzureEngine, you will need:

Make sure you have these credentials available and correctly configured when initializing the AzureEngine.

ElevenlabsEngine

For the ElevenlabsEngine, you need:

CoquiEngine

Delivers high quality, local, neural TTS with voice-cloning.

Downloads a neural TTS model first. In most cases it be fast enought for Realtime using GPU synthesis. Needs around 4-5 GB VRAM.

On most systems GPU support will be needed to run fast enough for realtime, otherwise you will experience stuttering.

Quick Start

Here's a basic usage example:

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

Feed Text

You can feed individual strings:

stream.feed("Hello, this is a sentence.")

Or you can feed generators and character iterators for real-time streaming:

def write(prompt: str):
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content" : prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choices"][0]["delta"].get("content")) is not None:
            yield text_chunk

text_stream = write("A three-sentence relaxing speech.")

stream.feed(text_stream)
char_iterator = iter("Streaming this character by character.")
stream.feed(char_iterator)

Playback

Asynchronously:

stream.play_async()
while stream.is_playing():
    time.sleep(0.1)

Synchronously:

stream.play()

Testing the Library

The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.

Note that most of the tests still rely on the "old" OpenAI API (<1.0.0). Usage of the new OpenAI API is demonstrated in openai_1.0_test.py.

Pause, Resume & Stop

Pause the audio stream:

stream.pause()

Resume a paused stream:

stream.resume()

Stop the stream immediately:

stream.stop()

Requirements Explained

Configuration

Initialization Parameters for TextToAudioStream

When you initialize the TextToAudioStream class, you have various options to customize its behavior. Here are the available parameters:

engine (BaseEngine)

on_text_stream_start (callable)

on_text_stream_stop (callable)

on_audio_stream_start (callable)

on_audio_stream_stop (callable)

on_character (callable)

output_device_index (int)

tokenizer (string)

language (string)

muted (bool)

level (int)

Example Usage:

engine = YourEngine()  # Substitute with your engine
stream = TextToAudioStream(
    engine=engine,
    on_text_stream_start=my_text_start_func,
    on_text_stream_stop=my_text_stop_func,
    on_audio_stream_start=my_audio_start_func,
    on_audio_stream_stop=my_audio_stop_func,
    level=logging.INFO
)

Methods

play and play_async

These methods are responsible for executing the text-to-audio synthesis and playing the audio stream. The difference is that play is a blocking function, while play_async runs in a separate thread, allowing other operations to proceed.

fast_sentence_fragment (bool)
buffer_threshold_seconds (float)
minimum_sentence_length (int)
log_characters (bool)
log_synthesized_text (bool)

By understanding and setting these parameters and methods appropriately, you can tailor the TextToAudioStream to meet the specific needs of your application.

CUDA installation

These steps are recommended for those who require better performance and have a compatible NVIDIA GPU.

Note: to check if your NVIDIA GPU supports CUDA, visit the official CUDA GPUs list.

To use torch with support via CUDA please follow these steps:

Note: newer pytorch installations may (unverified) not need Toolkit (and possibly cuDNN) installation anymore.

  1. Install NVIDIA CUDA Toolkit: For example, to install Toolkit 11.8 please

  2. Install NVIDIA cuDNN: For example, to install cuDNN 8.7.0 for CUDA 11.x please

    • Visit NVIDIA cuDNN Archive.
    • Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
    • Download and install the software.
  3. Install ffmpeg:

    You can download an installer for your OS from the ffmpeg Website.

    Or use a package manager:

    • On Ubuntu or Debian:

      sudo apt update && sudo apt install ffmpeg
    • On Arch Linux:

      sudo pacman -S ffmpeg
    • On MacOS using Homebrew (https://brew.sh/):

      brew install ffmpeg
    • On Windows using Chocolatey (https://chocolatey.org/):

      choco install ffmpeg
    • On Windows using Scoop (https://scoop.sh/):

      scoop install ffmpeg
  4. Install PyTorch with CUDA support:

    pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
  5. Fix for to resolve compatility issues: If you run into library compatility issues, try setting these libraries to fixed versions:

    pip install networkx==2.8.8
    pip install typing_extensions==4.8.0
    pip install fsspec==2023.6.0
    pip install imageio==2.31.6
    pip install networkx==2.8.8
    pip install numpy==1.24.3
    pip install requests==2.31.0

💖 Acknowledgements

Huge shoutout to the team behind Coqui AI - especially the brillant Eren Gölge - for being the first giving us local high quality synthesis with realtime speed and even a clonable voice!

Thank you Pierre Nicolas Durette for giving us a free tts to use without GPU using Google Translate with his gtts python library.

Contribution

Contributions are always welcome (e.g. PR to add a new engine).

License Information

❗ Important Note:

While the source of this library is open-source, the usage of many of the engines it depends on are not: External engine providers often restrict commercial use in their free plans. This means the engines can be used for noncommercial projects, but commercial usage requires a paid plan.

Engine Licenses Summary:

CoquiEngine

ElevenlabsEngine

AzureEngine

SystemEngine

GTTSEngine

OpenAIEngine

Disclaimer: This is a summarization of the licenses as understood at the time of writing. It is not legal advice. Please read and respect the licenses of the different engine providers yourself if you plan to use them in a project.

Author

Kolja Beigel
Email: kolja.beigel@web.de
GitHub