AlexandreSajus / JARVIS

Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface
GNU General Public License v3.0
468 stars 87 forks source link
deepgram elevenlabs llm openai python taipy tts voice-assistant

JARVIS

JARVIS helping me choose a firearm

Your own voice personal assistant: Voice to Text to LLM to Speech, displayed in a web interface.

How it works

  1. :microphone: The user speaks into the microphone
  2. :keyboard: Voice is converted to text using Deepgram
  3. :robot: Text is sent to OpenAI's GPT-3 API to generate a response
  4. :loudspeaker: Response is converted to speech using ElevenLabs
  5. :loud_sound: Speech is played using Pygame
  6. :computer: Conversation is displayed in a webpage using Taipy

Video Demo

Youtube Devlog

Requirements

Python 3.8 - 3.11

Make sure you have the following API keys:

How to install

  1. Clone the repository
git clone https://github.com/AlexandreSajus/JARVIS.git
  1. Install the requirements
pip install -r requirements.txt
  1. Create a .env file in the root directory and add the following variables:
DEEPGRAM_API_KEY=XXX...XXX
OPENAI_API_KEY=sk-XXX...XXX
ELEVENLABS_API_KEY=XXX...XXX

How to use

  1. Run display.py to start the web interface
python display.py
  1. In another terminal, run jarvis.py to start the voice assistant
python main.py

Here is an example:

Listening...
Done listening
Finished transcribing in 1.21 seconds.
Finished generating response in 0.72 seconds.
Finished generating audio in 1.85 seconds.
Speaking...

 --- USER: good morning jarvis
 --- JARVIS: Good morning, Alex! How can I assist you today?

Listening...
...

Saying good morning