Build your own voice ai. This repo is for my YouTube video series on building an AI voice assistant with PyTorch.
Looking for contributors to help build out the assistant. There is still alot of work to do. This would be a good oppurtunity to learn Machine Learning and how to Engineer an entire ML system from the ground up. If you're interested join the Discord Server
TODO:
If you're on mac you can install portaudio
using homebrew
NOTICE: If you are using windows, some things may not work. For example, torchaudio. I suggest trying this on linux or mac, or use wsl2 on windows
virtualenv voiceassistant.venv
source voiceassistant.venv/bin/activate
pip install -r requirements.txt
If you are running with just the cpu
docker build -f cpu.Dockerfile -t voiceassistant .
If you are running on a cuda enabled machine
docker build -f Dockerfile -t voiceassistant .
For more details make sure to visit these files to look at script arguments and description
wakeword/neuralnet/train.py
is used to train the model
wakeword/neuralnet/optimize_graph.py
is used to create a production ready graph that can be used in engine.py
wakeword/engine.py
is used to demo the wakeword model
wakeword/scripts/collect_wakeword_audio.py
- used to collect wakeword and environment data
wakeword/scripts/split_audio_into_chunks.py
- used to split audio into n second chunks
wakeword/scripts/split_commonvoice.py
- if you download the common voice dataset, use this script to split it into n second chunks
wakeword/scripts/create_wakeword_jsons.py
- used to create the wakeword json for training
For more details make sure to visit these files to look at script arguments and description
collect data
python collect_wakeword_audio.py
cd VoiceAssistant/wakeword/scripts
mkdir data
cd data
mkdir 0 1 wakewords
python collect_wakeword_audio.py --sample_rate 8000 --seconds 2 --interactive --interactive_save_path ./data/wakewords
python replicate_audios.py --wakewords_dir data/wakewords/ --copy_destination data/1/ --copy_number 100
split_audio_into_chunks.py
.0
and 1
. 0
for non wakeword, 1
for wakeword. use create_wakeword_jsons.py
to create train and test json// make each sample is on a seperate line
{"key": "/path/to/audio/sample.wav, "label": 0}
{"key": "/path/to/audio/sample.wav, "label": 1}
train model
train.py
to train modeloptimize_graph.py
to create an optimized pytorch modeltest
engine.py
scriptYouTube Video for Speech Recognition
For more details make sure to visit these files to look at script arguments and description
speechrecognition/scripts/mimic_create_jsons.py
is used to create the train.json and test.json files with Mimic Recording Studio
speechrecognition/scripts/commonvoice_create_jsons.py
is used to convert mp3 into wav and create the train.json and test.json files with the Commonvoice dataset
spechrecognition/neuralnet/train.py
is used to train the model
spechrecognition/neuralnet/optimize_graph.py
is used to create a production ready graph that can be used in engine.py
spechrecognition/engine.py
is used to demo the speech recognizer model
spechrecognition/demo/demo.py
is used to demo the speech recognizer model with a Web GUI
The pretrained model can be found here at this google drive
Collect your own data - the pretrain model was trained on common voice. To make this model work for you, you can collect about an hour or so of your own voice using the Mimic Recording Studio. They have prompts that you can read from.
create a train and test json in this format...
// make each sample is on a seperate line
{"key": "/path/to/audio/speech.wav, "text": "this is your text"}
{"key": "/path/to/audio/speech.wav, "text": "another text example"}
use mimic_create_jsons.py
to create train and test json's with the data from Mimic Recording Studio.
python mimic_create_jsons.py --file_folder_directory /dir/to/the/folder/with/the/studio/data --save_json_path /path/where/you/want/them/saved
(The Mimic Recording Studio files are usually stored in ~/mimic-recording-studio-master/backend/audio_files/[random_string].)
use commonvoice_create_jsons.py
to convert from mp3 to wav and to create train and test json's with the data from Commonvoice by Mozilla
python commonvoice_create_jsons.py --file_path /path/to/commonvoice/file/.tsv --save_json_path /path/where/you/want/them/saved
if you dont want to convert use --not-convert
Train model
train.py
to fine tune. checkout the train.py argparse for other arguments
python train.py --train_file /path/to/train/json --valid_file /path/to/valid/json --load_model_from /path/to/pretrain/speechrecognition.ckpt
--load_model_from
argument in train.pyoptimize_graph.py
to create a frozen optimized pytorch model. The pretrained optimized torch model can be found in the google drive link as speechrecognition.zip
test
engine.py
scriptdocumenation to get this running on rpi is in progress...