AlexandreSajus / JARVIS

Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface

GNU General Public License v3.0

468 stars 87 forks source link

deepgram elevenlabs llm openai python taipy tts voice-assistant

readme

JARVIS

JARVIS helping me choose a firearm

Your own voice personal assistant: Voice to Text to LLM to Speech, displayed in a web interface.

How it works

:microphone: The user speaks into the microphone
:keyboard: Voice is converted to text using Deepgram
:robot: Text is sent to OpenAI's GPT-3 API to generate a response
:loudspeaker: Response is converted to speech using ElevenLabs
:loud_sound: Speech is played using Pygame
:computer: Conversation is displayed in a webpage using Taipy

Video Demo

Requirements

Python 3.8 - 3.11

Make sure you have the following API keys:

How to install

Clone the repository

git clone https://github.com/AlexandreSajus/JARVIS.git

Install the requirements

pip install -r requirements.txt

Create a .env file in the root directory and add the following variables:

DEEPGRAM_API_KEY=XXX...XXX
OPENAI_API_KEY=sk-XXX...XXX
ELEVENLABS_API_KEY=XXX...XXX

How to use

Run display.py to start the web interface

python display.py

In another terminal, run jarvis.py to start the voice assistant

python main.py

Once ready, both the web interface and the terminal will show Listening...
You can now speak into the microphone
Once you stop speaking, it will show Stopped listening
It will then start processing your request
Once the response is ready, it will show Speaking...
The response will be played and displayed in the web interface.

Here is an example:

Listening...
Done listening
Finished transcribing in 1.21 seconds.
Finished generating response in 0.72 seconds.
Finished generating audio in 1.85 seconds.
Speaking...

 --- USER: good morning jarvis
 --- JARVIS: Good morning, Alex! How can I assist you today?

Listening...
...

Saying good morning