This repository contains an end-to-end AI Voice Assistant pipeline. The system converts voice input to text using OpenAI's Whisper, processes the text with a Large Language Model (LLM) from Hugging Face, and then converts the response back to speech using Edge-TTS. It also features Voice Activity Detection (VAD), output restrictions, and tunable parameters such as pitch, voice type, and speed.
faster-whisper
model.edge-tts
model.Clone the repository:
git clone https://github.com/your-username/AI-Voice-Assistant-Pipeline.git
cd AI-Voice-Assistant-Pipeline
Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the required packages:
pip install -r requirements.txt
Set up Hugging Face API Token:
export HUGGINGFACE_API_TOKEN=your_hugging_face_token # On Windows: set HUGGINGFACE_API_TOKEN=your_hugging_face_token
You can run the assistant in two modes:
Real-Time Interaction via Terminal:
python main.py
Streamlit Web Interface:
streamlit run app.py
Here's an overview of the main file structure:
- AI-Voice-Assistant-Pipeline/
- main.py # Entry point for real-time voice assistant
- app.py # Streamlit web interface for the assistant
- requirements.txt # Required Python packages
- utils/
- stt.py # Speech-to-Text conversion with Whisper
- tts.py # Text-to-Speech conversion with Edge-TTS
- llm.py # Large Language Model response generation
- README.md # Project documentation
Speech-to-Text (STT):
faster-whisper(tiny)
Large Language Model (LLM):
Text-to-Speech (TTS):
edge-tts
Latency Optimization:
Voice Activity Detection (VAD):
Output Restriction:
Start by testing the real-time voice assistant with:
python main.py
This will allow you to interact with the assistant directly from your terminal.
To explore the web interface, run:
streamlit run app.py
Initial Model Download: The faster-whisper
model may take some time to download initially (~2GB). Make sure your internet connection is stable.
API Token Issues: Ensure your Hugging Face API token is correctly set up as an environment variable.
Latency Issues: If you experience delays, review your system resources and consider optimizing the code or using more powerful hardware.
Feel free to submit pull requests or open issues if you encounter any bugs or have suggestions for improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
Thank you for checking out the AI Voice Assistant Pipeline! If you find this project useful, please give it a star ⭐ on GitHub!
e60660f (added all relevant files)