amaurycrickx / recognito

Java Speaker Recognition Framework
Apache License 2.0
191 stars 102 forks source link

Documentation #2

Closed gsavvid closed 10 years ago

gsavvid commented 10 years ago

Hello Amaury and thanks for publishing this project.

I'm trying to implement a seamless speaker recognition mobile application for Android devices but I still haven't decided which library to use for the actual speaker recognition process. Your project seems like a very good choice since it's open source, looks easy to implement and it's written in Java (Android's native language). I would be extremely thankful if you could provide me with any sort of documentation. For instance, how and where are the speaker models created, which features are extracted, etc. Anything that could be helpful, would be gratefully appreciated.

Thanks much, Giorgos.

amaurycrickx commented 10 years ago

Hello Giorgios,

Thanks for your interest. I should really start a user mailing list instead of having people create issues :-)

Actually, the code should be easy enough to read. I've added extended javadoc and put links to wikipedia for each relevant algorithm to provide background info.

For a quick start, the method to look at for vocal print extraction is in the main Recognito class: private double[] extractFeatures(double[] vocalSample, float sampleRate)

First, remove silence Then normalize audio volume (i.e. scan for the loudest sample and adjust the whole so the loudest sample is at max value) Then extract features using Linear Predictive Coding (LPC) algorithm and a sliding window on audio data.

LPC tries to predict what the next value could be. It is used for audio compression formats like mp3.

Here, we just run LPC on small windows of audio data, half overlapping and average the returned results (i.e. an array of 20 double values). You actually get some sort of hash corresponding to the signature of the voice at hand.

When the same process is applied to another vocal sample from the same speaker, the signatures tend to resemble each other. So the recognition process is actually calculating the distance between the unknown speaker's vocal print and the vocal prints of all known speakers and return the closest matches in an ordered list.

You might want to check this thread on stackoverflow for advice on how to use it within Android: http://stackoverflow.com/questions/22443124/voice-matching-in-android/22445304#22445304

Problems you might encounter for your particular use case should be related to the surrounding noise differences: say you extracted your vocal print while at home in a very quiet environment, when testing recognition e.g. in the middle of a crowd, the surrounding noise might affect the results in such a way that the correct answer is not returned. I haven't actually tested this so I cannot tell you to what extent this surrounding noise will actually affect the results.

Actually, I have a good idea on how to improve all this (MFCC algorithm instead of LPC, noise reduction preprocess) but I'd need a bit more time to implement it :-)

Also watch out for distortions in the signal: when you have ranges of consecutive max values (flat top or bottom), it means the signal is distorted and the microphone's sensitivity is too high

HTH

Amaury

gsavvid commented 10 years ago

Thanks a lot for the reply and sorry for opening an issue for this question. It was the only way I found to contact you. I think I'll actually start integrating your solution into my project. It would be nice if you could start a user emailing list for easier communication in the future :)

Best, Giorgos.

amaurycrickx commented 10 years ago

No worries, I know I should have made this list available :-)

So thanks to you, here we go: https://groups.google.com/d/forum/recognito