LAION-AI / natural_voice_assistant

MIT License
432 stars 36 forks source link

BUD-E: A conversational and empathic AI Voice Assistant

BUD-E (Buddy for Understanding and Digital Empathy) is an open-source AI voice assistant which aims for the following goals:

  1. replies to user requests in real-time
  2. uses natural voices, empathy & emotional intelligence
  3. works with long-term context of previous conversations
  4. handles multi-speaker conversations with interruptions, affirmations and thinking pauses
  5. runs fully local, on consumer hardware.

This project is a collaboration between LAION, the ELLIS Institute Tübingen, Collabora and the Tübingen AI Center.

<p align="center", style="margin-top:30px;"> Image 1 Image 2 Image 3 Image 4

This demo shows an interaction with the current version of BUD-E on an NVIDIA RTX 4090. With this setup, the voice assistant answers with a latency of 300 to 500 milliseconds.

Quick Start

  1. Clone this repository and follow the installation guide in the readme.
  2. Start the voice assistant by running the main.py file in the repository root.
  3. Wait until "Listening.." is printed to the console and start speaking.

Roadmap

Altough the conversations with the current version of BUD-E already feel quite natural, there are still a lot of components and features missing what we need to tackle on the way to a truly and naturally feeling voice assistant. The immediate open work packages we'd like to tackle are as follows:

Reducing Latency & minimizing systems requirements

Increasing Naturalness of Speech and Responses

Keeping track of conversations over days, months and years

Enhancing functionality and ability of voice assistant

Enhancing multi-modal and emotional context understanding

Building a UI, CI and easy packaging infrastructure

Extending to multi-language and multi-speaker

Installation

The current version of BUD-E contains the following pretrained models:

The model weights are downloaded and cached automatically when running the inference script for the first time.

To install BUD-E on your system follow these steps:

1) Setup Environment and Clone the Repo

We recommend to create a fresh conda environment with python 3.10.12.

conda create --name bud_e python==3.10.12
conda activate bud_e

Next, clone this repository. Make sure to pass the -recurse-submodules argument to clone the required submodules as well.

git clone --recurse-submodules https://github.com/LAION-AI/natural_voice_assistant.git

2) Install espeak-ng

Ubuntu:

sudo apt-get install festival espeak-ng mbrola 

Windows:

3) Install pytorch

Install torch and torchaudio using the configurator on https://pytorch.org/

4) Install Required Python Packages

Inside the repository run:

pip install -r requirements.txt

On Ubuntu, you might install portaudio which is required by pyaudio. If you encounter any errors with pyaudio, try to run:

sudo apt install portaudio19-dev

5) Start your AI conversation

Command-Line Arguments

Below are the available command-line arguments for starting the assistant:

Argument Description Default Value
--audio-device-idx Select the audio device by index that should be used for recording. If no device idex is selected, the default audio device will be used. None
--audio-details Show details for the selcted audio device like the sample rate or number of audio channels. false
--tts-model Select the model that should be used for text to speech. You can choose between StyleTTS2 and WhisperSpeech. Please note that WhisperSpeech relies on torch.compile which is not supported on windows. You can still use WhisperSpeech on Windows but the TTS inference will be very slow. StyleTTS2

Troubleshooting

OSError: [Errno -9999] Unanticipated host error

This error could occur, if access to your audio device is denied. Please check your local settings and allow desktop apps to access the microphone.

OSError "invalid samplerate" or "invalid number of channels"

These are pyaudio related issues that occur if the selected audio device does not support the current sample rate or number of channels. Sample rate and channels are selected automatically regarding the current audio-device index that is used. If you encounter any problems related to pyaudio, use the --audio-device-idx argument and try a different device id. A list of all available audio-devices is printed when executing main.py.

Collaborating to Build the Future of Conversational AI

The development of BUD-E is an ongoing process that requires the collective effort of a diverse community. We invite open-source developers, researchers, and enthusiasts to join us in refining BUD-E's individual modules and contributing to its growth. Together, we can create an AI voice assistants that engage with us in natural, intuitive, and empathetic conversations.

If you're interested in contributing to this project, join our Discord community or reach out to us at bud-e@laion.ai.