NaomiProject / Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
https://projectnaomi.com/
MIT License
240 stars 47 forks source link

Ability to recognize users by voice #267

Open aaronchantrill opened 4 years ago

aaronchantrill commented 4 years ago

Detailed Description

Naomi should be able to respond differently to different users. If a family member asks "do I have any emails" it should not be necessary for Naomi to ask "who are you?" This would allow the user's voice to act as a sort of authorization. As part of the speech to text training, ultimately I would like to train a different acoustic model for each member of the family. Being able to identify the speaker by voice before selecting the acoustic model would make it possible to use an acoustic model optimized for the speaker, which should lead to better recognition overall.

Context

This could start allowing a database to be built around the user, and also help improve speech recognition

Possible Implementation

Your Environment

aaronchantrill commented 3 years ago

I'm looking at using this project for an initial test: https://github.com/Suhee05/Text-Independent-Speaker-Verification

I already have had the NaomiSTTTrainer.py allowing you to enter a name for a while, so I have a database with a bunch of recordings labeled with my own name and just a few with other people's names. It would be interesting to see how many recordings are needed to differentiate between two individuals, and also how much audio is required to do a check.

aaronchantrill commented 2 years ago

I've been working with Speaker-Verification-Toolkit and have a test project at python_speaker_verification_test. This package is easy to install on x86_64 systems (pip install speaker-verification-toolkit) but a pain on ARM (Raspberry Pi). To install it on ARM, you need to install version 11 of llvm first, which isn't really obvious from the error messages. Also, when building the package from source it is import to build it as type=Release or else you will run out of memory during the linking step.

$ wget https://github.com/llvm/llvm-project/releases/download/llvmorg-11.1.0/llvm-11.1.0.src.tar.xz
$ tar -xvf llvm-11.1.0.src.tar.xz
$ cd llvm-11.1.0.src/
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make
$ sudo make install
$ pip install speaker-verification-toolkit