ALIZE-Speaker-Recognition / android-alize

ALIZE for the Android platform.
GNU Lesser General Public License v3.0
36 stars 17 forks source link

Generating world.gmm #15

Open satyamer14i opened 6 years ago

satyamer14i commented 6 years ago

How can you generate gmm/world.gmm?

AhmadKhattak commented 6 years ago

I am also trying to figure out how to do so, i've tried generating the ubm gmm files through other means such as python Link1 and have used world.gmm files from other projects such as Link2 however the results were not good.

From Link1, I used the gender models however they were not accepted by the alize giving error of Out of Bound Memory Exception.

From Link2, I used the world.gmm file but the verification of speakers failed.

If you are able to find out how to generate the world.gmm files and perform speaker verification, kindly share the solution here, Thanks !

jfb84 commented 6 years ago

Hello As far as I know, the ubm/world model should be trained using spkdet program trainworld You need computational power for training the wordmodel... So upload ALIZE and SPKDET on a linux computer, add your data and train a model with trainworld.

Best JF

De: "AhmadKhattak" notifications@github.com À: "ALIZE-Speaker-Recognition/android-alize" android-alize@noreply.github.com Cc: "Subscribed" subscribed@noreply.github.com Envoyé: Vendredi 30 Mars 2018 16:24:13 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)

I am also trying to figure out how to do so, i've tried generating the ubm gmm files through python [ https://appliedmachinelearning.wordpress.com/2017/06/14/voice-gender-detection-using-gmms-a-python-primer/ ] and have used world.gmm files from other projects such as [ https://github.com/umbatoul/Android-Voice-IDentification-App-using-SPRO-ALIZE-LIARAL/tree/master/assets/gmm ] however the results were not good.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-377535043 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd714jnBRRaQtkTjP8dXarLUM8mfz_ks5tjkANgaJpZM4TBgKh | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


AhmadKhattak commented 6 years ago

Hi,

First of all, Thanks for your reply !

I tried using the LIA_RAL library however there were issues with compiling and spro (i'm in the process of trying to resolve them), also what computational power would be required ? I'm using a MBPro 2012, 2.3 GHz Intel Core i7 with 16 GB RAM.

Just to have understanding, when generating the ubm/world model the data that should be used should it contain the target speakers audio as well or just audio of a number of any different speakers ?

jfb84 commented 6 years ago

The power is enough. It depends on the size of your training set.

For an ubm, you need as much data as possible, well fitted with your real environment. 20 to 30 speakers by gender is a minimum. You could have different duration of recordings. A maximum of different recording sessions is better. 40 minutes of speech in total is a minimum. 100 hours is better...

The best is to have something well balanced (it will help you to go from UBM/GMM to iVectors...) like: 10 recordings per speaker, 50 speakers (per gender), 30 s per recording

Best JF

De: "AhmadKhattak" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Comment" comment@noreply.github.com Envoyé: Vendredi 30 Mars 2018 16:55:02 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)

Hi,

I tried using the LIA_RAL library however there were issues with compiling and spro, also what computational power would be required ? I'm using a MBPro 2012, 2.3 GHz Intel Core i7 with 16 GB RAM.

Just to have clarification, when generating the ubm/world model the data that should be used should it contain the target speakers audio as well or just audio of a number of any different speakers ?

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-377541590 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd79Vap0iQlgnhdIyx9ayy1e4xGPy5ks5tjkdGgaJpZM4TBgKh | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


jfb84 commented 6 years ago

You could have your "client" data in the ubm training. It is not good if you want to (scientifically) evaluate the performance. But good if you have few data and if you want to improve the results

De: "AhmadKhattak" notifications@github.com À: "ALIZE-Speaker-Recognition" android-alize@noreply.github.com Cc: "Jean-Francois Bonastre" jean-francois.bonastre@univ-avignon.fr, "Comment" comment@noreply.github.com Envoyé: Vendredi 30 Mars 2018 16:55:02 Objet: Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)

Hi,

I tried using the LIA_RAL library however there were issues with compiling and spro, also what computational power would be required ? I'm using a MBPro 2012, 2.3 GHz Intel Core i7 with 16 GB RAM.

Just to have clarification, when generating the ubm/world model the data that should be used should it contain the target speakers audio as well or just audio of a number of any different speakers ?

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-377541590 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ATmd79Vap0iQlgnhdIyx9ayy1e4xGPy5ks5tjkdGgaJpZM4TBgKh | mute the thread ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


satyamer14i commented 6 years ago

Hello jfb84, could you please tell me what all steps I need to follow in order to generate the ubm.

I'm collecting audio recordings which are in the form of .3gp, however I could convert the same to a .wav

So after gathering the data how can I call the TrainWorld model and how can I specify the data it needs to refer to. I have researched a lot on Alize but nowhere have I found the steps needed to generate the UBM/GMM please help, Please!

AhmadKhattak commented 6 years ago

Hi @jfb84 ,

I am trying to use the TrainWorld from the LIA_RAL Library https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/tree/master/LIA_SpkDet/TrainWorld to generate a world model. What i've done is store the audio files in .wav format (16 bit PCM) in the same folder as the TrainWorld.exe and TrainWorld.cfg (from the LIA_RAL/bin and LIA_RAL/LIA_SpkDet/TrainWorld/cfg directories respectively) after following the steps to compile and make LIA_RAL library. I've also copied the seg_app.lst (from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory) file to the same folder.

Now i've seen that the feature files path in the .cfg file is seg_app.lst and it points to test1, test2 (i edited it to only point to test1) so i then copied the test1.lbl and test1.prm files from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory.

My question is what is the significance of the test1.lbl and test1.prm files and if I want to generate a world model using the audio files .wav format (16 bit PCM) how would i go about editing/making the test1.lbl and test1.prm files ?

Thanks !

Hi @satyamer14i , For reference i've been trying to generate the ubm model using the following steps,

  1. Downloaded and Compiled the Alize Core Library. https://github.com/ALIZE-Speaker-Recognition/alize-core

  2. Downloaded and Compiled the LIA_RAL library. https://github.com/ALIZE-Speaker-Recognition/LIA_RAL

  3. Through trial and error, attempted to run TrainWorld in the following manner,

    a. Copied the audio files in .wav format to the LIA_RAL/LIA_SpkDet/TrainWorld/cfg folder.

    b. Copied the TrainWorld.exe from the LIA_RAL/bin folder to LIA_RAL/LIA_SpkDet/TrainWorld/cfg folder

    c. Copied the test1.lbl, test1.prm, seg_app.lst from the LIA_RAL/LIA_SpkDet/TrainWorld/test folder to the cfg folder. (edited the seg_app.lst to only contain the word test1)

    d. Ran the following command on Terminal. TrainWorld --config TrainWorld.cfg

*Notes You can see in the TrainWorld.cfg file the following line, inputFeatureFilename seg_app.lst

And the seg_app.lst file contains the following content after editing, test1

which points towards the test1.prm files i checked when the test1.prm file was not in the folder, it gave me an error that FileNotFoundException ./test1.prm

and when the test1.lbl file was not in the folder, then it gave me an error that FileNotFoundException, ./xxx so i put the test1.lbl file and the error was removed.

However, at the end the terminal is stuck at

Compute global mean and conv

Also, importantly even when i removed the audio files from the folder the command ran which means that it only requires the .prm and .lbl files to run and therefore that is what my question was that how do i generate or edit .prm and .lbl files to reflect my audio files.

Another point is that the folder i used LIA_RAL/LIA_SpktDet/TrainWorld/cfg was due to convenience not because it was required.

Thanks !

jfb84 commented 6 years ago

.prm are the feature files outputted by spro..lbl are time label files which say where are the in interest speech segments (usually segments with speech, detected by energydetector). If you don t want to use labels you could use addDefaultLabel option in trainworld config (take care: in this case you should NOT have .lbl files)Jf

Envoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : AhmadKhattak notifications@github.com Date : 04/04/2018 05:20 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Comment comment@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) Hi, I am trying to use the TrainWorld from the LIA_RAL Library https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/tree/master/LIA_SpkDet/TrainWorld to generate a world model. What i've done is store the audio files in .wav format (16 bit PCM) in the same folder as the TrainWorld.exe and TrainWorld.cfg (from the LIA_RAL/bin and LIA_RAL/LIA_SpkDet/TrainWorld/cfg directories respectively) after following the steps to compile and make LIA_RAL library. I've also copied the seg_app.lst (from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory) file to the same folder. Now i've seen that the feature files path in the .cfg file is seg_app.lst and it points to test1, test2 (i edited it to only point to test1) so i then copied the test1.lbl and test1.prm files from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory. My question is what is the significance of the test1.lbl and test1.prm files and if I want to generate a world model using the audio files .wav format (16 bit PCM) how would i go about editing/making the test1.lbl and test1.prm files ? Thanks !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@AhmadKhattak in #15: Hi,\r\n\r\nI am trying to use the TrainWorld from the LIA_RAL Library https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/tree/master/LIA_SpkDet/TrainWorld to generate a world model. What i've done is store the audio files in .wav format (16 bit PCM) in the same folder as the TrainWorld.exe and TrainWorld.cfg (from the LIA_RAL/bin and LIA_RAL/LIA_SpkDet/TrainWorld/cfg directories respectively) after following the steps to compile and make LIA_RAL library. I've also copied the seg_app.lst (from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory) file to the same folder.\r\n\r\nNow i've seen that the feature files path in the .cfg file is seg_app.lst and it points to test1, test2 (i edited it to only point to test1) so i then copied the test1.lbl and test1.prm files from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory.\r\n\r\nMy question is what is the significance of the test1.lbl and test1.prm files and if I want to generate a world model using the audio files .wav format (16 bit PCM) how would i go about editing/making the test1.lbl and test1.prm files ?\r\n\r\nThanks !"}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-378467838"}}}

jfb84 commented 6 years ago

Could be good to use dev-alize mail list for this kind of questions as it is out of Android part.

Envoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : AhmadKhattak notifications@github.com Date : 04/04/2018 05:20 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Comment comment@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) Hi, I am trying to use the TrainWorld from the LIA_RAL Library https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/tree/master/LIA_SpkDet/TrainWorld to generate a world model. What i've done is store the audio files in .wav format (16 bit PCM) in the same folder as the TrainWorld.exe and TrainWorld.cfg (from the LIA_RAL/bin and LIA_RAL/LIA_SpkDet/TrainWorld/cfg directories respectively) after following the steps to compile and make LIA_RAL library. I've also copied the seg_app.lst (from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory) file to the same folder. Now i've seen that the feature files path in the .cfg file is seg_app.lst and it points to test1, test2 (i edited it to only point to test1) so i then copied the test1.lbl and test1.prm files from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory. My question is what is the significance of the test1.lbl and test1.prm files and if I want to generate a world model using the audio files .wav format (16 bit PCM) how would i go about editing/making the test1.lbl and test1.prm files ? Thanks !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@AhmadKhattak in #15: Hi,\r\n\r\nI am trying to use the TrainWorld from the LIA_RAL Library https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/tree/master/LIA_SpkDet/TrainWorld to generate a world model. What i've done is store the audio files in .wav format (16 bit PCM) in the same folder as the TrainWorld.exe and TrainWorld.cfg (from the LIA_RAL/bin and LIA_RAL/LIA_SpkDet/TrainWorld/cfg directories respectively) after following the steps to compile and make LIA_RAL library. I've also copied the seg_app.lst (from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory) file to the same folder.\r\n\r\nNow i've seen that the feature files path in the .cfg file is seg_app.lst and it points to test1, test2 (i edited it to only point to test1) so i then copied the test1.lbl and test1.prm files from the LIA_RAL/LIA_SpkDet/TrainWorld/test directory.\r\n\r\nMy question is what is the significance of the test1.lbl and test1.prm files and if I want to generate a world model using the audio files .wav format (16 bit PCM) how would i go about editing/making the test1.lbl and test1.prm files ?\r\n\r\nThanks !"}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-378467838"}}}

AhmadKhattak commented 6 years ago

Hi @jfb84, i've tried sending email to dev-alize mail list however i'm getting an error that Delivery not Authorized, Message Refused. Are there any other actions that i need to take prior to sending an email to dev-alize@listes.univ-avignon.fr ?

I've managed to generate the world.gmm model using the Tutorial for LIA_SpkDet — GMM/UBM System from http://alize.univ-avignon.fr/ .

Subsequently, I tried to use the same audio files from which the world model was generated in the ALIZE Android to create Speaker Models and then verify them using audio, (i was using the world.gmm file mentioned), however even if i give audio of speaker B to verify against model of speaker A it verifies and says that matched.

Then, i tried to use speaker identification, however it does not identify a speaker from audio file.

Example:

From the 01_GMM-UBM_system_with_ALIZE3.0 (downloaded from http://alize.univ-avignon.fr/)

world.gmm model was generated using the /data/sph/ files

Subsequently,

xaaf.pcm (Speaker A) and xaao.pcm (Speaker B) from /data/pcm/ were used to create Speaker Models in Android Alize.

Tried Verifying xaao.pcm (Speaker B) against model of xaao.pcm (Speaker A) and it verified it.

I am at this point unable to understand how to use ALIZE Android to get good results. Any guidance would be much appreciated.

Also, while using LIA_RAL library for TrainWorld, I encountered an error when using the .prm files (features extracted from .wav files using sfbcep while following instructions from 01_GMM-UBM_system_with_ALIZE3.0/01_RUN_feature_extraction.sh and 01_GMM-UBM_system_with_ALIZE3.0/02a_RUN_spro_front-end.sh) which was

(SegTools) The label format is LIARAL [ InvalidDataException 0x7facb5d05378 ] message = "Wrong header" source file = FeatureFileReaderSPro3.cpp line number = 105 fileName = ./male1.norm.prm

Thanks !

jfb84 commented 6 years ago

HiYes. You have to subscribe. ..The procedure is on the website AlizeBut should be an email at subscribe.dev-alize@...BestJf

Envoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : AhmadKhattak notifications@github.com Date : 08/04/2018 15:36 (GMT+00:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Mention mention@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) Hi @jfb84, i've tried sending email to dev-alize mail list however i'm getting an error that Delivery not Authorized, Message Refused. Are there any other actions that i need to take prior to sending an email to dev-alize@listes.univ-avignon.fr ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

AhmadKhattak commented 6 years ago

Thanks @jfb84 . I'll post the problem on the list now.

channae commented 6 years ago

@AhmadKhattak Appreciate if you could share your findings on how you managed to improve speaker identification.

AhmadKhattak commented 6 years ago

@channae I'm still in the process of performing the speaker verification using android alize. What i'm now trying to do is generate the world model file using the LIA_RAL library and also the speaker models and then use Android Alize to perform the speaker verification / identification.

channae commented 6 years ago

@AhmadKhattak Awesome, I'm actually heading in the same direction. I could find plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc). I'm planning on generating the world model as you mentioned using the LIA_RAL library. Please do update this thread on your progress. Thanks.

mitchsan commented 6 years ago

Hi all, In case anyone gets stuck on the SPro step, here's what I had to do to get it to recognize the SPHERE format (which the data files for the tutorial come in). Also, the error messages I got were along the lines of sfbcep error -- unknown input signal format SPHERE with --format must be one of PCM16, ALAW, ULAW, WAVE or SPHERE.

1) download sphere and aclocal && automake —add-missing && autoconf && ./configure && make 2) download spro as before, aclocal && automake —add-missing && autoconf Now, on the configure step, ./configure --with-sphere=/path/to/sphere/dir make 3) In the tutorial bin file, symlink in the sfbcep from spro

Hopefully this helps someone working on similar things in the future.

AhmadKhattak commented 6 years ago

Hi @mitchsan Thank you ! could you mention the link from where you downloaded the sphere library ? In my case I changed the input_format to PCM and it converted the Sphere files to PCM using the h_strip and w_decode files.

mitchsan commented 6 years ago

2.6 worked for me, from here: https://github.com/imanel/nist-sphere There's also what looks like a tarball for 2.5 hosted on NIST here: http://www.speech.cs.cmu.edu/comp.speech/Section1/AudioSoftware/nist.html I think I also found a 2.7 that didn't build properly for me from here: https://www.nist.gov/itl/iad/mig/tools (under Corpus Building Tools)

YYLee92 commented 5 years ago

Hi all,

I am generating the world.gmm and speaker model from 01_GMM-UBM_system_with_ALIZE3.0. After finish run the 03_RUN_gmm-ubm.sh, I go to the /log to see the TrainTarget.cfg and see this: " TrainTarget - Load world model [world] Use[world] for initializing EM Train model [spk01] Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4 MAP Algo MAPOccDep Mean adaption, param[14] Nb training iterations[1] Normalise model off Bagged segments, Initial frames[11575] Selected frames[11575] % selected[100] ML (partial) estimate it[0] (take care, it corresponds to the previous it,0 means init likelihood) = -67.1866 MAP Algo MAPOccDep Mean adaption, param[14] Nb training iterations[1] Normalise model off Save client model [spk01]"

Can I know is my speaker model is trained correctly? And is the "Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4" will effect my speaker verification results?

Regards, YY

YYLee92 commented 5 years ago

Hi All,

I am working on generating the world.gmm and speaker.gmm from Alize LIA_SpkDet — GMM/UBM System tutorial (http://alize.univ-avignon.fr/doc/01_GMM-UBM_system_with_ALIZE3.0.tar.gz) and test the world.gmm and speaker.gmm I have created from the tutorial in this Android-ALIZE-Tutorial (https://github.com/ALIZE-Speaker-Recognition/Android-ALIZE-Tutorial). And the speaker verification results that I get is not accurate.

I am using my own recorded .wav audio file converted to .pcm format for training the speaker model. Info of my audio file use to train the speaker model is: -Sample rate: 8000 -Bits per sample: 16 Audio Channel: 1/Mono Duration: 5 minutes

Steps I have done to create the world.gmm and speaker.gmm in Alize LIA_SpkDet — GMM/UBM System tutorial:

1) Add in my own pcm file into /data/pcm folder. 2) Add a new data.lst file name data_pcm.lst for the audio file, and listed my pcm file name inside the file. 3) Removed the original .SPH file under /sph folder for training the 40 speaker model provided in this tutorial and also the file name listed in data.lst file and also in all.lst under /lst folder. 4) Next I go to the /ndx folder, to edit the trainModel.ndx which is use to train the speaker model. 5) I replace the trainModel.ndx file with my own speaker input, the example of my trainModel.ndx file is as below: spk01 fileName1 spk02 filneName2 ... 6) For impostor.lst and UBM.lst file in /lst folder, I did not make changes. I just replace the original 40 speaker data in this tutorial with the speaker data I want to create, and my UBM model will still be trained with the data provided in this tutorial. 7) After that, I run the first script (01_RUN_feature_extraction.sh) of this tutorial in terminal to generate the feature files in ./data/prm. 8) After succesfully generate the feature files, I run the second script (02a_RUN_spro_front-end.sh) to get normalized feature files in ./data/prm. 9) Then next step, I run the first two command in script number 3 (03_RUN_gmm-ubm.sh) as I want to generate the world.gmm using bin/TrainWorld and speaker.gmm using the bin/TrainTarget. Script I use from (03_RUN_gmm-ubm.sh) to generate the world.gmm and speaker.gmm from this tutorial is as below:

  1. UBM training echo "Train Universal Background Model by EM algorithm" bin/TrainWorld --config cfg/TrainWorld.cfg &> log/TrainWorld.log echo " done, see log/TrainWorld.log for details"
  2. Speaker GMM model adaptation echo "Train Speaker dependent GMMs" bin/TrainTarget --config cfg/TrainTarget.cfg &> log/TrainTarget.cfg echo " done, see log/TrainTarget.cfg for details"

10) After I have succesfully generated the world.gmm and all the speaker.gmm from this tutorial, I copy the .gmm file and paste it to the Android-ALIZE-Tutorial to do the verification test. I verify the speaker model again the audio I use to train it and the results is not matching. The verification test show the speaker model I have trained is not match against it own audio but it will match with others audio used for train others speaker.

I would like to ask is the ways I create the world.gmm and all the speaker.gmm is correct? If it is wrong, can anyone kindly advice on the correct way to do this? Thanks in advance and appreciate it.

Regards, YiYang

MSAlghamdi commented 5 years ago

YYLee92,

If you're gonna use Spro you need to have your data in the SPH file all with .sph. Instead of using w_decode to covert SPHERE files into raw PCM, make sure SPro has been linked to the SPHERE library by compiling Spro ./configure --with-sphere[=path] (assuming that you've already compiled nist-sphere before).

If you'd try that please let me if it make any change.

YYLee92 commented 5 years ago

Hi @MSAlghamdi ,

Thank you for replying. I have try by converting my own audio file using sox (http://sox.sourceforge.net/) to .sph format then trained the world.gmm and individual speaker model, but the results did not seem have any changes. Can you advice further on this matter? Thanks in advance.

Regards, YiYang

MSAlghamdi commented 5 years ago

it could be "Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4" effecting your results. Have figured out what it is about?

Truncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results.

Please let me know if that's the case. This is very important for me.

jfb84 commented 5 years ago

HiThis warning has no importance if it speaks only about 1 frame (size 5 becomes 4) at the last position in the file.It is usually a bug (with a patch somewhere)JfEnvoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : MSAlghamdi notifications@github.com Date : 07/11/2018 08:42 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Mention mention@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) it could be "Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4" effecting your results. Have figured out what it is about? Truncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results. Please let me know if that's the case. This is very important for me.

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@MSAlghamdi in #15: it could be \"Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4\" effecting your results. Have figured out what it is about?\r\n\r\nTruncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results.\r\n\r\nPlease let me know if that's the case. This is very important for me. "}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794", "url": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)", "sections": [ { "text": "", "activityTitle": "MSAlghamdi", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@MSAlghamdi", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 319160993\n}" } ], "themeColor": "26292E" } ]

jfb84 commented 5 years ago

But I agree with you : if you use debug or verboseLevel 2 options (the second is better) you will have details on the number of frames.Important to check.Use also ReadFeatFile to verify that your prm files are ok.JfEnvoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : MSAlghamdi notifications@github.com Date : 07/11/2018 08:42 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Mention mention@noreply.github.com Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) it could be "Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4" effecting your results. Have figured out what it is about? Truncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results. Please let me know if that's the case. This is very important for me.

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@MSAlghamdi in #15: it could be \"Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4\" effecting your results. Have figured out what it is about?\r\n\r\nTruncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results.\r\n\r\nPlease let me know if that's the case. This is very important for me. "}],"action":{"name":"View Issue","url":"https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794", "url": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)", "sections": [ { "text": "", "activityTitle": "MSAlghamdi", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@MSAlghamdi", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 319160993\n}" } ], "themeColor": "26292E" } ]

outemzabetsamy commented 5 years ago

Hi everyone , I am new in this project , and this project is very important for me if it works , I didn't try it for now, and i would know if the speaker recognition works in Android and if you Can give me more information and advices for that , it will be very grateful from you , I am developing an application that wiil have the voice authentication ( just for lock/unlock my application with voice ) , Please reply to me

Le mer. 7 nov. 2018 09:16, jfb84 notifications@github.com a écrit :

But I agree with you : if you use debug or verboseLevel 2 options (the second is better) you will have details on the number of frames.Important to check.Use also ReadFeatFile to verify that your prm files are ok.JfEnvoyé depuis mon smartphone Samsung Galaxy. -------- Message d'origine --------De : MSAlghamdi < notifications@github.com> Date : 07/11/2018 08:42 (GMT+01:00) À : ALIZE-Speaker-Recognition/android-alize android-alize@noreply.github.com Cc : jfb84 jean-francois.bonastre@univ-avignon.fr, Mention < mention@noreply.github.com> Objet : Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15) it could be "Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4" effecting your results. Have figured out what it is about? Truncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results. Please let me know if that's the case. This is very important for me.

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/ALIZE-Speaker-Recognition/android-alize","title":"ALIZE-Speaker-Recognition/android-alize","subtitle":"GitHub repository","main_image_url":" https://assets-cdn.github.com/images/email/message_cards/header.png ","avatar_image_url":" https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":" https://github.com/ALIZE-Speaker-Recognition/android-alize"}},"updates":{"snippets":[{"icon":"PERSON","message":"@MSAlghamdi in #15: it could be \"Warning File[xaod], Truncate Segment begin:30990 length:5 file size:30994 new length:4\" effecting your results. Have figured out what it is about?\r\n\r\nTruncating the segments might effect the model. As you know, a segment contains a sequence of feature vectors. If you don't have enough segments for a number of utterances, your model won't be accurate and consequently the final results.\r\n\r\nPlease let me know if that's the case. This is very important for me. "}],"action":{"name":"View Issue","url":" https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794 "}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": " https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794 ", "url": " https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [ALIZE-Speaker-Recognition/android-alize] Generating world.gmm (#15)", "sections": [ { "text": "", "activityTitle": "MSAlghamdi", "activityImage": " https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@MSAlghamdi", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"ALIZE-Speaker-Recognition/android-alize\",\n\"issueId\": 15\n}" }, { "targets": [ { "os": "default", "uri": " https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436532794 " } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 319160993\n}" } ], "themeColor": "26292E" } ]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ALIZE-Speaker-Recognition/android-alize/issues/15#issuecomment-436540652, or mute the thread https://github.com/notifications/unsubscribe-auth/AqPS2mSGQbYZWnpkiDR3pu7a8bBwuVo7ks5uspb5gaJpZM4TBgKh .