ALIZE-Speaker-Recognition / LIA_RAL

A high-level toolkit for speaker recognition, build on top of ALIZE-Core.
http://alize.univ-avignon.fr
GNU Lesser General Public License v3.0
125 stars 27 forks source link

Simple test failure (working but totally wrong result) - need help #34

Open zsogitbe opened 4 years ago

zsogitbe commented 4 years ago

All data and config: test_project1-G3-lean.zip

I have made a very simple test with 3 speakers in which I make an UBM with all of the speech recordings from the 3 speakers and I adapt (train) a GMM model for each speaker with 3 (GD) distributions only (mixtureDistribCount=3). Then I test an input speech (one of the inputs) against the 3 speaker models and the UBM. The input are 2 wav's from Jennifer Lawrence, 2 from Natalie Portman and 3 from Will Smith. The input for the final identification/test is 'test_project1/audio/JenniferLawrence/voice1.wav' (the first audio from Jennifer Lawrence) and the Alize identification result is FALSE (can not recognize the input) with 'Will Smith' as best match which is of course completely wrong. The score is calculated with simple LLK and it results in a negative value ( -15.17):

test_project1/audio/JenniferLawrence/voice1.wav --> test_project1/prm//200421_100649_4c6f.init.prm
Writing to: 200421_100649_4c6f
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Total Number of frames in threads: 1809
Writing to: 200421_100649_4c6f
featureCount = 1809
spkCount = 3
UBMLoaded = 1
Identification result: FALSE, score: -15.1764, best matched uId: will_smith
Ready!

All data is included in the zip file. May I please ask one of you to run this simple test and let me know your result? Please let me also know if you find the reason for the wrong results.

jfb84 commented 4 years ago

Hi Zoltan,

I don't have the time to go deeply in your question but I suspect that you have a UBM training problem. Build a cross gender UBM with 3 speakers, 3 files is tricky and the config files are not designed for that. (BTW: what is the number of gaussian components in your UBM ?) Best JF

De: "Zoltan Somogyi" notifications@github.com À: "ALIZE-Speaker-Recognition" LIA_RAL@noreply.github.com Cc: "Subscribed" subscribed@noreply.github.com Envoyé: Mardi 21 Avril 2020 12:38:02 Objet: [ALIZE-Speaker-Recognition/LIA_RAL] Simple test failure (working but totally wrong result) - need help (#34)

All data and config: [ https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/files/4509265/test_project1-G3-lean.zip | test_project1-G3-lean.zip ]

I have made a very simple test with 3 speakers in which I make an UBM with all of the speech recordings from the 3 speakers and I adapt (train) a GMM model for each speaker with 3 (GD) distributions only (mixtureDistribCount=3). Then I test an input speech (one of the inputs) against the 3 speaker models and the UBM. The input are 2 wav's from Jennifer Lawrence, 2 from Natalie Portman and 3 from Will Smith. The input for the final identification/test is 'test_project1/audio/JenniferLawrence/voice1.wav' (the first audio from Jennifer Lawrence) and the Alize identification result is FALSE (can not recognize the input) with 'Will Smith' as best match which is of course completely wrong. The score is calculated with simple LLK and it results in a negative value ( -15.17): test_project1/audio/JenniferLawrence/voice1.wav --> test_project1/prm//200421_100649_4c6f.init.prm Writing to: 200421_100649_4c6f Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Writing to: 200421_100649_4c6f featureCount = 1809 spkCount = 3 UBMLoaded = 1 Identification result: FALSE, score: -15.1764, best matched uId: will_smith Ready!

All data is included in the zip file. May I please ask one of you to run this simple test and let me know your result? Please let me also know if you find the reason for the wrong results.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/issues/34 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AE4Z33Y7VCZQGK5CPJMVSELRNVZQVANCNFSM4MNGDOQA | unsubscribe ] .

--


Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre


zsogitbe commented 4 years ago

Hi Zoltan, I don't have the time to go deeply in your question but I suspect that you have a UBM training problem. Build a cross gender UBM with 3 speakers, 3 files is tricky and the config files are not designed for that. (BTW: what is the number of gaussian components in your UBM ?) Best JF

Hi Jean-Francois,

Thank you very much for your answer! Yes I have built an UBM with the 3 speakers and with the 7 audio recordings in the attached zip file and with 3 GD's (the config file is attached, you can see the params). I deliberately use 3 GD's. This is a very simple test which should find the right person with even a very simple software. That Alize can not do this indicates an error hopefully in my use of Alize. I have tried to solve this problem already for a long time but without success. I hope that someone will have time to run this very simple test (all files are attached) and let me know if the error is in my use of Alize or somewhere else. It should take 30 minutes for someone who knows Alize well and have the compiled binaries.

Best regards, Zoltan

zsogitbe commented 4 years ago

I was able to make something which approximates what I wanted to achieve. The problems were the combination of the values of several variables in the config and the example source code in SimpleSpkDetSystem.

Thank you! Great work!