ALIZE-Speaker-Recognition / android-alize

ALIZE for the Android platform.
GNU Lesser General Public License v3.0
35 stars 17 forks source link

Inaccurate Recognition #19

Open RITCHIEHuang opened 6 years ago

RITCHIEHuang commented 6 years ago

I managed to use PCM 16 bit Voice data, Sampling rate is 44100Hz. The recognition result always get true even the completely different sound data.

Sent with GitHawk

AhmadKhattak commented 6 years ago

Same is the case here, I've gotten always true results when matching completely different sound against the speaker models generated in the android alize (createspeakermodel).

In the case of android alize, what i've gotten to know is that

1) you will have to generate the world.gmm file using the LIA_RAL(trainworld) after extracting the mfcc features using spro from the audio files that you will be using for world model.

2) then when you load the world.gmm file in android alize, you just provide some input audio to be used to generate the speaker model and android alize will use the spro libraries to extract the mfcc features and create the speaker model(createspeakermodel)

3) when trying to match the speaker model with any audio input then the results are almost always true. (This is the point where i am assuming you are and where I am also stuck)

Now, what i did was also try to generate the speaker model using LIA_RAL(traintarget) because LIA_RAL has normalization functions (NormFeat, EnergyDetector etc) that are used on the mfcc features extracted from audio files using spro which i assumed would lead to good results. In android alize, only the mfcc features are extracted, the normalization is not yet included/applied.

However, the results when using the speaker models generated using LIA_RAL in android alize (through loadspeakermodel) were not better infact i got false result when matching same audio against speaker.

I'm assuming that i've made some mistakes in the configuration files .cfg when generating the speaker models or the world model, i'll be testing and will get back to you when i've got good results.

@jfb84 i hope that my understanding/explanation is correct, if there are any mistakes, your advice would be most appreciated. Thanks !

jfb84 commented 6 years ago

Hello When you say that the results are always true you speak about the decision, binary. But what about the score, the llr? Decision depends on the threshold so good numbers and bad threshold give bad decisions... (By default I think that the threshold is set to 0) This is particularly true without feature normalization.

Best

Sent from my ASUS

-------- Message d'origine -------- De :AhmadKhattak notifications@github.com Envoyé :Wed, 25 Apr 2018 06:12:46 +0200 À :ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Ci :jfb84 jean-francois.bonastre@univ-avignon.fr,Mention mention@noreply.github.com Sujet :Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

Same is the case here, I've gotten always true results when matching completely different sound against the speaker models generated in the android alize (createspeakermodel).

In the case of android alize, what i've gotten to know is that

1) you will have to generate the world.gmm file using the LIA_RAL(trainworld) after extracting the mfcc features using spro from the audio files that you will be using for world model.

2) then when you load the world.gmm file in android alize, you just provide some input audio to be used to generate the speaker model and android alize will use the spro libraries to extract the mfcc features and create the speaker model(createspeakermodel)

3) when trying to match the speaker model with any audio input then the results are almost always true. (This is the point where i am assuming you are and where I am also stuck)

Now, what i did was also try to generate the speaker model using LIA_RAL(traintarget) because LIA_RAL has normalization functions (NormFeat, EnergyDetector etc) that are used on the mfcc features extracted from audio files using spro which i assumed would lead to good results. In android alize, only the mfcc features are extracted, the normalization is not yet included/applied.

However, the results when using the speaker models generated using LIA_RAL in android alize (through loadspeakermodel) were not better infact i got false result when matching same audio against speaker.

I'm assuming that i've made some mistakes in the configuration files .cfg when generating the speaker models or the world model, i'll be testing and will get back to you when i've got good results.

@jfb84 i hope that my understanding/explanation is correct, if there are any mistakes, your advice would be most appreciated. Thanks !

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-384155482

jfb84 commented 6 years ago

Hi44kHz le notcreally usefull.Better To have 16 or 22.Take care To have only one charnel data.Best

Envoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : RITCHIEHuang notifications@github.com Date : 25/04/2018 04:56 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : Subscribed subscribed@noreply.github.com Objet : [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19) I managed to use PCM 16 bit Voice data, Sampling rate is 44100Hz.

The recognition result always get true even the completely different sound data. Sent with GitHawk

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Inaccurate Recognition (#19)"}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19"}}}

ra2637 commented 6 years ago

Hi, I think you need to specify threshold in the config file. The item is missing in the sample config. So what you have to do is check the score, find a suitable threshold, and set it in the config file. Ex. threshold 70.

In this case, only when the score >= 70, the result will become true. Otherwise, the result is false.

However, without normalization, the result scores will change (even for the same person who records different audio files) and the threshold value will become a bit useless in this case. @jfb84 explained it in my question https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/9

My test accuracy were always bad. Even I tried to put my voice in alize tutorial, modified related files, and used the normalization, it still showed false when comparing my voice data. One possibility is I only tried GMM/UBM, I haven't check it on i-vector and JFA, it will be nice if someone can check it.

jfb84 commented 6 years ago

HiEven with gmmubm toi should have quite good results ifThe ubm/world model is goodThe length of the messages is good!10s is à minimum. 30s better... As long as you dont have acceptable results with ubmgmm dont move To objectifs... as ivetor je based on gmm/ubmBestJf

Envoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : ra2637 notifications@github.com Date : 25/04/2018 19:11 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Mention mention@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19) Hi,

I think you need to specify threshold in the config file.

The item is missing in the sample config.

So what you have to do is check the score, find a suitable threshold, and set it in the config file.

Ex. threshold 70. In this case, only when the score >= 70, the result will become true.

Otherwise, the result is false. However, without normalization, the result scores will change (even for the same person who records different audio files) and the threshold value will become a bit useless in this case. @jfb84 explained it in my question https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/9 My test accuracy were always bad. Even I tried to put my voice in alize tutorial, modified related files,

and used the normalization, it still showed false when comparing my voice data.

One possibility is I only tried GMM/UBM, I haven't check it on i-vector and JFA,

it will be nice if someone can check it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@ra2637 in #19: Hi,\r\nI think you need to specify threshold in the config file.\r\nThe item is missing in the sample config.\r\nSo what you have to do is check the score, find a suitable threshold, and set it in the config file.\r\nEx. threshold 70.\r\n\r\nIn this case, only when the score \u003e= 70, the result will become true.\r\nOtherwise, the result is false.\r\n\r\nHowever, without normalization, the result scores will change (even for the same person who records different audio files) and the threshold value will become a bit useless in this case. @jfb84 explained it in my question https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/9\r\n\r\nMy test accuracy were always bad. Even I tried to put my voice in alize tutorial, modified related files,\r\nand used the normalization, it still showed false when comparing my voice data.\r\nOne possibility is I only tried GMM/UBM, I haven't check it on i-vector and JFA,\r\nit will be nice if someone can check it.\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-384363223"}}}

AhmadKhattak commented 6 years ago

Hi @ra2637 when you say that you "Even I tried to put my voice in alize tutorial, modified related files, and used the normalization, it still showed false when comparing my voice data." Did you use the LIA_RAL library to generate the speaker models with normalization applied or did you use any other method to apply normalization to the mfcc features extracted from the audio files.

Almost similar to your case, i have attempted to use the speaker models with normalization and then matching the models with the audio used for generating those models but the scores were in the negative values.

ra2637 commented 6 years ago

@jfb84 I think I finally know what's my problem. Thanks! I was using a short voice file (about 3s). It could be better to specify the voice input length (10s as minimum. 30s better) in README to help people avoid this problem.

@AhmadKhattak I think the example uses LIA_RAL to generate models. In the tutorial, it has steps from voice files model creation to normalization. So I just added my audio file (and you need some modification also) and follow the steps again. Maybe we should try to use voice file that the length is more than 30s.

AhmadKhattak commented 6 years ago

Hi @jfb84 what is the minimum number of hours of speech required for a good ubm/world model for android alize ? should i use separate models for male and female speakers ? i can have multiple audio files of the same speaker or should there only be one audio file per speaker to be used for generating ubm/world model ?

@ra2637 the example uses LIA_RAL but the normalization is not applied, that was what i wanted to confirm that how you applied normalization. Could you guide on how you generated your ubm/world model file e.g number of hours of speech, number of different speakers etc ?

So, like i only require a good ubm/world model file and then i use the android-alize by providing it with an audio recording of more than 30s of my voice to create speaker model and then i should be able to verify my speaker model when using my audio file ? What is the format to use when setting the threshold in the config file, e,g you said "threshold value" right ?

I did try using more than 30s of recording at the start of testing android-alize but the results were not good however that time i had not set the threshold value and probably the ubm/world model file was also not good.

Thanks !

ra2637 commented 6 years ago

@AhmadKhattak I used the tutorial from alize official site: http://alize.univ-avignon.fr/. Not sure which tutorial you used. So actually I didn't apply normalization to Android app. And maybe you could try to use the tutorial first to see if it work.

The format of the threshold is threshold. You can take a look for here https://github.com/ra2637/voice_recognition/blob/67af80711ed6da4b74818b421d904b0c68f42899/Application/src/main/assets/alize.cfg#L18

Normalization is very important for verification part but I don't see anyone implement it. So even we implement that or wait someone to do that.

AhmadKhattak commented 6 years ago

@ra2637 you used this tutorial LIA_SpkDet — GMM/UBM System ? you used this to generate the gmm world model and the speaker models or only the world model ? Did you use the default .sph files provided with the tutorial to generate the world model or provided your own audio files for the world model ? If you provided your own, how many different speakers were in your data set and what was the speech duration for a speaker on average ?

Also, were your results improved when you increased the audio duration to 30s during training speaker model in android alize ?

ra2637 commented 6 years ago

@AhmadKhattak Yes, I used LIA_SkpDet - GMM/UBM system. And yes I used the default .sph files to generate the world models. And I just added my own voice file into the folder and made it worked as other voice files. So I went through all the bash files and figured out each step and where should I put my files. I'd love to give you more detail but I did it a long time ago and the folder was deleted. However, it's not hard to figure out the example structure and insert your voice in.

And I don't test the result anymore since my project needs to recognize ppl by voice less than 5s long.

AhmadKhattak commented 6 years ago

@ra2637 I followed the same steps, used default .sph files once and then when that world model didnt get good results used other audio files (.wav from another data set) to generate the world model and didnt get good results still. And when you say added your own voice to the folder, you mean to say in the location /data/sph like used your voice file in the generation of the world model ?

I tried verifying the speakers like

1) generated world model using .wav files from a data set. 2) in android alize, used one of those .wav files to create a speaker model. 3) in android alize, tried verifying the speaker model with that .wav file.

So, i got bad results following those steps, was just getting matched true against every .wav file and when identifying speakers using android alize it was just matching wrong speaker models against the .wav files provided as input.

So, then i understood from your explanation how to set the threshold value as i had not set it previously, but i wanted to know that setting the threshold value got you good results. Thanks !

ra2637 commented 6 years ago

@AhmadKhattak Yes, it is what I did. And I didn't get a good result as well. But I thought it might because my voice file is short. So did you get a good result by using voice file over 30s long?

AhmadKhattak commented 6 years ago

@ra2637 So, i have not yet managed to calculate the accurate threshold, i will try to do so and then get back to you with the results. (as recently i tried to use another world model and the scores all came back in the negative values when matching speakers, that has me confused). Edit: (The speaker identification results were negative and that was because i had set the samplerate as 16000 but the audio files that i was using had 8000 bitrate, so this has been resolved)

satwickdash commented 5 years ago

@AhmadKhattak Did you manage to find an accurate threshold ? Also, what is the default value of threshold being used, when we didn't specify it in the configuration file ?

As for the world.gmm file, does it need to include audio training from speakers I wish to enroll ? Or can I just use the example world.gmm file which you have generated here in #22 : https://github.com/ALIZE-Speaker-Recognition/android-alize/files/2062602/world.zip

jfb84 commented 5 years ago

Hi You don't HAVE to include audios from your client speakers in the UBM training. BUT you COULD do it if your casework is mainly a closeset one. Close set : quite all the speakers are recorded in the system (few external impostors)

If you have few speakers and data, it is also acceptable in order to make a first system you will improve after. Sometime it is interresting also when the linguistic messages are constrained (personal password based systems. Look at Larcher, A., Bonastre, J. F., & Mason, J. S. (2008, September). Reinforced temporal structure information for embedded utterance-based speaker recognition. In Interspeech (pp. 371-374). ) JF

De: "Satwick Dash" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Mention" mention@noreply.github.com Envoyé: Jeudi 30 Août 2018 08:26:14 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

[ https://github.com/AhmadKhattak | @AhmadKhattak ] Did you manage to find an accurate threshold ? Also, what is the default value of threshold being used, when we didn't specify it in the configuration file ?

As for the world.gmm file, does it need to include audio training from speakers I wish to enroll ? Or can I just use the example world.gmm file which you have generated here in [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/22 | #22 ] : [ https://partage.univ-avignon.fr/url | https://github.com/ALIZE-Speaker-Recognition/android-alize/files/2062602/world.zip ]

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-417204096 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd72SElqsgXlCiRiCy54IL5ItiVdqgks5uV4WGgaJpZM4TitUW | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


satwickdash commented 5 years ago

Thanks @jfb84

I'm expecting to be able to identify only 8-10 speakers. But currently, when I enroll even 2 speakers, it misclassifies between them.

Also, when I use the statement SimpleSpkDetSystem.SpkRecResult identificationResult = alizeSystem.identifySpeaker(), in the identificationResult variable, identificationResult.match outputs to false, yet still giving a speakerID and score.

When the identificationResult is true, it makes sense. What do the speakerID and score mean, when match is false ?

jfb84 commented 5 years ago

I have to check but... even if you think (thanks to the threshold) that none of the registered speakers had pronounced the extract, one of them appears to be the closest one and have a score.

De: "Satwick Dash" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Mention" mention@noreply.github.com Envoyé: Jeudi 30 Août 2018 13:47:57 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

Thanks [ https://github.com/jfb84 | @jfb84 ]

I'm expecting to be able to identify only 8-10 speakers. But currently, when I enroll even 2 speakers, it misclassifies between them.

Also, when I use the statement SimpleSpkDetSystem.SpkRecResult identificationResult = alizeSystem.identifySpeaker() , in the identificationResult variable, identificationResult.match outputs to false, yet still giving a speakerID and score.

When the identificationResult is true, it makes sense. What do the speakerID and score mean, when match is false ?

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-417290993 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd74BKoygWVRIVLAe3bDpofGVDzQmDks5uV9DtgaJpZM4TitUW | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


jfb84 commented 5 years ago

Could you give an example of the scores for the two speakers in this case?

De: "Satwick Dash" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Mention" mention@noreply.github.com Envoyé: Jeudi 30 Août 2018 13:47:57 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

Thanks [ https://github.com/jfb84 | @jfb84 ]

I'm expecting to be able to identify only 8-10 speakers. But currently, when I enroll even 2 speakers, it misclassifies between them.

Also, when I use the statement SimpleSpkDetSystem.SpkRecResult identificationResult = alizeSystem.identifySpeaker() , in the identificationResult variable, identificationResult.match outputs to false, yet still giving a speakerID and score.

When the identificationResult is true, it makes sense. What do the speakerID and score mean, when match is false ?

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-417290993 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd74BKoygWVRIVLAe3bDpofGVDzQmDks5uV9DtgaJpZM4TitUW | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


satwickdash commented 5 years ago

I enrolled my and my colleague's voice on the same phrase, and while identification,

When my voice was used, result was: Match: false Speaker ID: Prashant Score: 1.7618825

When his voice was used: Match: false Speaker ID: Prashant Score: 1.7112812

In both cases, only his name is occuring in the result, with match set to false.

Edit: I have set the `threshold to 70 in the configuration file. How does that affect these results ?

AhmadKhattak commented 5 years ago

@satwickdash it seems that you have found more success than i did, could you kindly guide me ?

What world model file did you use ? The one i posted or did you generate a world model with other data set than the sample data set provided with the tutorial ? Your scores are better than mine. How did you enroll the speakers ? Would you be willing to share the steps you have followed ?

Thank you.

AhmadKhattak commented 5 years ago

In my case, i enrolled the speakers using the same sample audio data set provided with the tutorials, e.g Speaker 1 using xaod.wav (converted from .sph) etc. However when trying to verify them using the same audio file used for their Speaker Model generation, it would identify the wrong Speaker.

I did not attempt to use a big data set because i wanted to be able to have the same results in the Android ALIZE that i had gotten from the Tutorial.

The scores that i was comparing against were in res/target-seg_gmm.res from the tutorial wherein i assumed that the values given in the format,

(These scores are not accurate as i played around with the configurations and the .sh files by commenting out some functions to test it out, however the accurate scores that were gotten when using the default files were somewhat in the range -1 to 0 to 1)

M spk01 1 xaod 33.9173 M spk02 1 xaod 2.93694 M spk03 1 xaod 6.24663 M spk04 1 xaod 22.3445 M spk05 1 xaod 16.4111 M spk06 0 xaod -5.6665

meant that the Speaker was Male (from M) and the Speaker model (spk01-06) matched against audio file xaod gave the scores 33.9173 to -5.6665 and the third column was the decision 1 for matched and 0 for not matched. In the tutorial i couldnt find where the threshold was set and i searched the configuration files for the threshold parameter.

satwickdash commented 5 years ago

Hi @AhmadKhattak

I don't know if I have found any success. I have used the world model file which you posted above and only in android-alize. I haven't yet tried running the LIA_SkpDet - GMM/UBM system tutorial.

The audio I'm recording is by AudioRecord API. See this link PCMRecorder. I just have removed the WAV headers.

I'm following the LIA_SkpDet - GMM/UBM tutorial now to see if it gives me any more understanding. I'm yet to understand what the scores and speaker identification means, when identificationResult.match is false.

satwickdash commented 5 years ago

Update:

I tried removing the threshold parameter, which I specified in the configuration file after reading this discussion. And in the results, identificationResult.match started coming out to be true.

To my understanding, the score is just how much likely the audio is, of the identified speaker, regardless of the threshold. The threshold variable, just specifies, if the score is good enough to match. And that is purely developer-dependent.

I hope I'm getting it correct. @jfb84

Please let me know if I'm being unclear in explaining my doubts.

Also @AhmadKhattak could you please explain how you tweaked the code, to have scores greater than single digits ?

AhmadKhattak commented 5 years ago

Hi @satwickdash, what I did was in the tutorial in the configuration file ComputeTest_GMM comment out the lableFilesPath variable and change the featureFileExtension to .tmp.prm. This basically meant that the featurefiles of the audio used to match against the speaker models were not normalized.

Now, the reason i did this was because there is no normalization being done in the Android-ALIZE code so therefore i wanted to check what results i got in the Tutorial when i skip normalization and then compare those results with the ones gotten in Android-ALIZE.

Also, when using the world.gmm file in Android-ALIZE, as is your case i would also get the first speaker to be always identified when enrolling more than one speaker. Also, my results would sometimes be in the negative.

So, in the end what i was not able to understand was how is Android-ALIZE calculating the scores because if i can understand the scores then i can set a suitable threshold.

Any advice would be appreciated @jfb84 like i want to be able to record audio using my android phone and pass it along to Android-ALIZE which would be installed using the pre-built world.gmm and create a speaker model for my audio and then subsequently verify me. However i'm stuck at the scores part. Could you kindly point out the mistakes i'm making or whether the process i'm following needs to be corrected ?

Kind regards

YYLee92 commented 5 years ago

Hi @AhmadKhattak ,

For the Android-Alize project, do you have update on the verification results and score? I generate the world.gmm and speaker model in LIA_SkpDet-GMM/UBM system tutorial. The world.gmm is generated use the original UBM.lst data but for speaker model I use the audio 1 - 4 provided in the Android-Alize-Project. Then I test my world.gmm and the speaker.gmm that I generate in the tutorial in Android-ALIZE-Tutorial project and I cannot get the accurate results when verify the speaker model again the original audio 1 - 4 which I use to train the speaker model.

Regards, yy

donn-smith commented 5 years ago

I am using the Android-Alize project in an attempt at speaker verification. I used the very good tutorial to write my program and I'm using the World.gmm file that came with the project. I recorded 30 seconds of my own voice on my Android phone when I created a new speaker in the system. I take a 10 second sample when I attempt to verify (not identify, I'm more interested in verification) that I am the speaker; my numbers are at best in the low 20s. I'm an American (Southern) male. My Asian female coworker who is young enough to be my daughter gets scores within 1/2 a point of mine. I've tried a sample rate of 8000 and 44100, both 16 bit PCM. If I'm reading the comments correctly here, I need to produce my own world.gmm to get my numbers up above 70. Is that correct? Should I stick with a sample rate of 8000? Are there any other recommendations?

donn-smith commented 5 years ago

I used the world.gmm from https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/22 and the numbers went up. I got the recognition with my voice (30 seconds training, 10 second sample) up over 70. But a British speaker on the television got 68.

utlandr commented 5 years ago

@donn-smith Is your verification audio identical to the UBM audio? If you used 16-bit 8000Hz (single channel) audio to train the world model, you should do the same during verification/enrollment.

donn-smith commented 5 years ago

@utlandr I tried both with similar results. Since I "stole" the world.gmm from @AhmadKhattak, I'm wondering if I should update the world.gmm file with my training data. Or is that even possible? I have to admit that I'm more of a programmer than a speech expert.

jfb84 commented 5 years ago

You could always train or update a worldmodel with your specific speaker enrolment data. It will increase the accuracy of the world model for these speakers... and decreases its generalization power if you have few speakers. But if you have about 20 speakers in your set, it is not a big problem. JF

De: "donn-smith" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Mention" mention@noreply.github.com Envoyé: Mardi 11 Décembre 2018 09:14:54 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

[ https://github.com/utlandr | @utlandr ] I tried both with similar results. Since I "stole" the world.gmm from [ https://github.com/AhmadKhattak | @AhmadKhattak ] , I'm wondering if I should update the world.gmm file with my training data. Or is that even possible? I have to admit that I'm more of a programmer than a speech expert.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-446110482 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd73eMpsPgIIZzLfBAwoUQQv5d7QJ1ks5u32l-gaJpZM4TitUW | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


donn-smith commented 5 years ago

Hmm... can I update the world.gmm on my Android phone? I wrote code to register a user that runs on the phone based on the tutorial; I'm hoping to use the same 30 second audio file to update the background model.

utlandr commented 5 years ago

@donn-smith In regards to the world model. I think earlier you mentioned testing with female voices? The world model you have I believe only contains male voices all recorded through a telephone conversation. A better world model for your specific use case is probably a good idea.

donn-smith commented 5 years ago

OK, so a new world model is required. And there is already an issue open here on how to do that, so we don't need to go into that here. But my other question still has merit: can I update the world model on the Android device (my Nexus phone at the moment) to include data from the speaker once I've verified the speaker? And would that improve the accuracy in verification?

donn-smith commented 5 years ago

As a partial answer to my own question: the assets directory is part of the APK for your app. You would probably need to copy the world.gmm to a file (ala new speaker .gmm files) to be able to modify it.

jfb84 commented 5 years ago

Hi It is not a so simple question. Speaker identification is a scientific and industrial specialty, with a lot of knowledge and some tricks to know... (I am open for collaboration!)

But... If you are using only your phone and if you are ok to retrain the system when you will change the phone model, it is better to use audio recorded on your phone in order to tain/adapt the world model. Train physically the world model on the phone is more difficult as a good world model is usually trained on large amount of data and GMM training requires processing power and memory.

If your system could receive data recording on several devices, or if you want to keep the world+user models when you are changing the phone, you wish to have a universal model (UBM: universal background model). In this case, you need training data coming from different phones and environments.

Best JF

De: "donn-smith" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Mention" mention@noreply.github.com Envoyé: Mercredi 12 Décembre 2018 09:43:04 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Inaccurate Recognition (#19)

OK, so a new world model is required. And there is already an issue open here on how to do that, so we don't need to go into that here. But my other question still has merit: can I update the world model on the Android device (my Nexus phone at the moment) to include data from the speaker once I've verified the speaker? And would that improve the accuracy in verification?

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/19#issuecomment-446507143 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd77AvmG1d7tURSJIKmxrY_wcRRjB_ks5u4MGYgaJpZM4TitUW | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


donn-smith commented 5 years ago

Perhaps it is best if I state my use case: basically, I want to verify that the speaker on a cell phone or other Android device is the owner of the device. Do I need to load the background model for this?

YYLee92 commented 5 years ago

Hi all,

It is necessary for us to update or train our own world.gmm model in order to get accurate speaker verification? Because I have trained speaker model with own audio recording for about 5 different speakers and using the provided world.gmm in this project, and for the verification test on the speaker model I have trained, I am not able to get correct verification results.

Regards, yiyang

alprsntrkk commented 5 years ago

I tried many combinations of sounds to training speaker models with pcm files. But i couldn't get the correct result for speaker and different models matching my test sound. Where is the problem? Do I record audio longer ? or Do I train models with more sound ? and my last question is is this library supporting only pcm files ?

chamecall commented 4 years ago

Did somebody get accurate results?

chamecall commented 4 years ago

@AhmadKhattak, how did you solve the problem ultimately?

haobregonz10 commented 4 years ago

Was the problem resolved? Any other alternative to ALIZE for voice user identification?

angryalok commented 9 months ago

Was the problem resolved? Any other alternative to ALIZE for voice user identification?

did you find any solution ?