Open ovninosa opened 8 years ago
Hello,
Try to run SpeakerVerificationExperiments project. In this app you can load some WAV for train and test and it provide some decision. I tried to optimize decisions for LPC+Pitch+VQ combination. I use WAV files with mono signal and with sampling rate equals to 11025 Hz. For the testing I use records of the short phrases approximately 1-2 seconds length.
This apps was a part of my masters degree research and I didn't have enough time to write the clear code and not remove that I should. So it can be buggy, but the mathematically it's absolutely correct. Currently I try to make it clear and more useful, but its takes a time.
For what purpose you want to run this application?
Thanks for your interest, BenderRodrigez
Hello Bender,
Thanks for your time.
I want to use this application for experiment on speaker authentication, I want to compare two audios of the same person and get a result false, true o percent (something like % likehood, there is a chance to get this?). I recently test your SpeakerVerificationExperiment its run OK, I was able to compare 2 wavs files and get a result.
There is a chance to make the training without the creation of txt? something like DB or something similar? I want to implement this logic over a webservice and the concurrency is a top notch topic for me. This code works over Text independent algorithm or Text dependent?
Thanks, Javenosa
Hello Javenosa,
To implement it as a part of the webservice, you should look at OpenTrainFile
and OpenTestFile
functions in MainWindow.xaml.cs
and remove all what dealt with saving and ploting.
In my purposes using a DB was unnecessary, so I just use global variable to simplify application logic. Inside the DB you should store the CodeBook
property of the VqCodeBook
for each user. Lately, to process the verification you should load it back and use DistortionMeasureEnergy
function to get the measure what I use to make a decision. But currently VectorQuantization
class have no such constructor (you can add an empty one or implement DistortionMeasureEnergy
outside this class and add code book as a parameter).
If true/false result is enough for you, just use GetSolution
function. Currently it returns 3 types of the result user verified / user not verified / system can't make decision over the solver settings.
Else, to estimate likehood you should look at FuzzySolver.cs
inside HelpersLibrary. Introduce ownVal
(same voice in both records) and foreignVal
(different voice in records) as public properties. This values will be placed in the range 0.0..1.0, where 0 - absolutely no and 1 - absolutely yes. The final solution is based on the borders limitation of this values.
All txt files is just for design time only and file writing can be safely removed from the source code (look at this file: TonalSpeechSelector.cs
and MainWindow.xaml.cs
). Also many of them will not be created if you build project in the Release configuration.
My speaker verification application implements text independent algorithm.
Hope, I write it clear.
Thanks, Bender
Bender,
Thanks for your time and effort, very clear explanation.
There is a difference if I use differente sample rates over my voice samples? You say 11250 for your purpouse, there is a mandatory sample rate for your algorithm?
I recently test with two different speakers, one male and female, and the result was true. This result can be related to the sample rate of the files or another thing (like background noise, length of the signal, etc)?
Thanks again, Javenosa
Javenosa,
This algorithm should work with any sample rate, but I have not tested it. I guess it's a bug. Also it's possible what some conditions is not good enough to be used in my verification system. I have tested system with background noise up to -5 dB SNR (it's a lot) and, as I say earlier, minimal signal's length was near 1 sec. It also may fails because of phrase structure not contains enough vowels or signal is too silent.
Can you attach this files? I try to investigate the reason, why it fails.
Thanks, Bender
Bender,
I cannot get the correct samples of one male and one female (I have a lot of voices samples)
Anyway, I have these 2 samples of my voice to compare, the result is false. The recording hardware was the same, the enviroment noise is almost the same.
https://www.dropbox.com/s/qhvz55io8wd4srt/voices.rar?dl=0
Thanks, Javenosa
Javenosa,
If trying to compare those two files - they are looks too long, therefore dictor's codebook size becomes not optimal. But this is not the source of the problem. All voice features calculated robustly on the both files. And this is a bad news, because the next point where app may fail, its FuzzySolver.cs. This class init two membership functions with parameters related to measure distribution.
This parameters calculated through some experiments with multiple calculation of such measure and searching of the optimal splitting on two classes. So I guess, what the main problem with the results it's what my membership functions is language dependent.
I continue my investigation of the problem to be sure in this theory or disprove it.
If it's not critical for you, you can use my own samples. Here a three dictors and five test phrases. https://yadi.sk/d/yKQ47etJtgiBQ
P.S. I will be offline for two weeks starting from this weekend, but I try to make the answer for you as soon, as possible.
Thanks, Bender
And one small note. Did you trying to use LPC feature only? If not, you shold do it. If system start to working, it will means what only two possible explanations left.
First, system have an issue with selected feature normalisation.
Second, solution maker is language dependent and should be refactored to use in particular language.
I think the first option is more likely to be true.
Bender,
Recently I test the same audios with LPC and the authentication result is true. But when I test with another voice (a female) the result is true too.
I make a little test, I used an audio editor to normalize the female voice but the result is still true.
Here are the 2 samples. https://www.dropbox.com/s/tzc2lvvfvyov0t6/voices2.rar?dl=0
Thanks for your time and effort. Nicolas
Nicolas,
I have made some work to improve pitch detection and lpc coefficients amplitude normalisation. It helps to improve the results but they still not perfect for samples what you send to me. The main reason is what pitch detection algorithm select incorrect spectrum harmonic.
I know why it happens but don't have a clear solution right now.
But if you try to use shorter samples for test and training it should reduce influence of this bug. Just because of less probability to meet this error.
Thanks. Ravil
Ravil,
Welcome back!
I understand your points, so, if I use a shorter samples I will get a better result over the recognition, right? On the other side, if you can fix this issue your code works really really great.
Thanks for your time. Javenosa
Nicolas,
Yes, it's correct. And I try to fix it in the near future. Sorry if I answer not so fast as earlier.
Thanks. Ravil
Ravil,
How are you??
Any update on this fix??
Thanks!
@javenosa Hi! Currently I have no enough time to complete it. But if I have, I will do something with this bug.
Hey @BenderRodrigez! How are you man??? a long time without talking, I hope you are okay
did you find any solution to this issue?
Thx
Hi! Long time indeed! Unfortunately, there is no solution from me on this issue. Seems like it require fine-tuning for concrete language. Which may be another research.
Hey!
There is a way to contribute to your project to do that fine tuning? If you can explain me in some way at least which files/classes to look and analyze on the debug side, with that information I can try it.
Thx
@ovninosa Yes, you can contribute :) If you are not familiar with Digital Signal Processing, I am advising you to have some quick intro first especially into FFT, Pitch Tracking, Tone (Voiced Speech) Detection, Linear Prediction.
The most of the job will be to dump intermediate results into files, plot graphics and analyze them.
I would suggest to start from the end of the processing and dig into likehood per time plot. It should show you how likely the voice in tested sample is sound like one in the fingerprint per some time window. As I remember, I used to dump array to text file and then import it into Excel or similar program.
I also advice you to use some tools like Audacity to analyze the audio files that you will use in tests. Spectrograms and some of the implemented DSP algorithms may come handy during the debugging.
For the algorithms, you should check https://github.com/BenderRodrigez/SpeakerVerification/tree/master/HelpersLibrary and https://github.com/BenderRodrigez/SpeakerVerification/tree/master/HelpersLibrary/DspAlgorithms specifically.
Hello,
I was trying to run your SpeakerVerification app, but on the function "TestPraatLpc" the code look a txt file which are not over github. Can you share the missing files to run the experiment? Or I need to do anything more?
At the moment I load 2 wavs, one for train and another to test, then I select the hamming window and LPC. The length of the two wavs is 4 seconds I need to tweak another thing?
Thanks, Javenosa