Open zsogitbe opened 4 years ago
Hi Zoltan,
I don't have the time to go deeply in your question but I suspect that you have a UBM training problem. Build a cross gender UBM with 3 speakers, 3 files is tricky and the config files are not designed for that. (BTW: what is the number of gaussian components in your UBM ?) Best JF
De: "Zoltan Somogyi" notifications@github.com À: "ALIZE-Speaker-Recognition" LIA_RAL@noreply.github.com Cc: "Subscribed" subscribed@noreply.github.com Envoyé: Mardi 21 Avril 2020 12:38:02 Objet: [ALIZE-Speaker-Recognition/LIA_RAL] Simple test failure (working but totally wrong result) - need help (#34)
All data and config: [ https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/files/4509265/test_project1-G3-lean.zip | test_project1-G3-lean.zip ]
I have made a very simple test with 3 speakers in which I make an UBM with all of the speech recordings from the 3 speakers and I adapt (train) a GMM model for each speaker with 3 (GD) distributions only (mixtureDistribCount=3). Then I test an input speech (one of the inputs) against the 3 speaker models and the UBM. The input are 2 wav's from Jennifer Lawrence, 2 from Natalie Portman and 3 from Will Smith. The input for the final identification/test is 'test_project1/audio/JenniferLawrence/voice1.wav' (the first audio from Jennifer Lawrence) and the Alize identification result is FALSE (can not recognize the input) with 'Will Smith' as best match which is of course completely wrong. The score is calculated with simple LLK and it results in a negative value ( -15.17): test_project1/audio/JenniferLawrence/voice1.wav --> test_project1/prm//200421_100649_4c6f.init.prm Writing to: 200421_100649_4c6f Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Total Number of frames in threads: 1809 Writing to: 200421_100649_4c6f featureCount = 1809 spkCount = 3 UBMLoaded = 1 Identification result: FALSE, score: -15.1764, best matched uId: will_smith Ready!
All data is included in the zip file. May I please ask one of you to run this simple test and let me know your result? Please let me also know if you find the reason for the wrong results.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, [ https://github.com/ALIZE-Speaker-Recognition/LIA_RAL/issues/34 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AE4Z33Y7VCZQGK5CPJMVSELRNVZQVANCNFSM4MNGDOQA | unsubscribe ] .
--
Jean-Francois BONASTRE Directeur du LIA LIA/CERI Université d'Avignon Tel: +33/0 490843514 directeur-lia@univ-avignon.fr @jfbonastre
Hi Zoltan, I don't have the time to go deeply in your question but I suspect that you have a UBM training problem. Build a cross gender UBM with 3 speakers, 3 files is tricky and the config files are not designed for that. (BTW: what is the number of gaussian components in your UBM ?) Best JF
Hi Jean-Francois,
Thank you very much for your answer! Yes I have built an UBM with the 3 speakers and with the 7 audio recordings in the attached zip file and with 3 GD's (the config file is attached, you can see the params). I deliberately use 3 GD's. This is a very simple test which should find the right person with even a very simple software. That Alize can not do this indicates an error hopefully in my use of Alize. I have tried to solve this problem already for a long time but without success. I hope that someone will have time to run this very simple test (all files are attached) and let me know if the error is in my use of Alize or somewhere else. It should take 30 minutes for someone who knows Alize well and have the compiled binaries.
Best regards, Zoltan
I was able to make something which approximates what I wanted to achieve. The problems were the combination of the values of several variables in the config and the example source code in SimpleSpkDetSystem.
Thank you! Great work!
All data and config: test_project1-G3-lean.zip
I have made a very simple test with 3 speakers in which I make an UBM with all of the speech recordings from the 3 speakers and I adapt (train) a GMM model for each speaker with 3 (GD) distributions only (mixtureDistribCount=3). Then I test an input speech (one of the inputs) against the 3 speaker models and the UBM. The input are 2 wav's from Jennifer Lawrence, 2 from Natalie Portman and 3 from Will Smith. The input for the final identification/test is 'test_project1/audio/JenniferLawrence/voice1.wav' (the first audio from Jennifer Lawrence) and the Alize identification result is FALSE (can not recognize the input) with 'Will Smith' as best match which is of course completely wrong. The score is calculated with simple LLK and it results in a negative value ( -15.17):
All data is included in the zip file. May I please ask one of you to run this simple test and let me know your result? Please let me also know if you find the reason for the wrong results.