Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.8k stars 1.83k forks source link

Speaker Recognition in JS with audio from MIC is not working #955

Closed josephsctan closed 3 years ago

josephsctan commented 3 years ago

Describe the bug Using Speaker Identification in JS with Mic input not working.

To Reproduce Steps to reproduce the behavior:

  1. Create Profile and Enroll Speakers (see here )

  2. Add this code

// this will work async IdentifySpeakers(profile, speech_config, url) { const audio_config = await this.GetAudioConfigFromURL(url); const model = window.SpeechSDK.SpeakerIdentificationModel.fromProfiles([profile]); const recognizer = new window.SpeechSDK.SpeakerRecognizer(speech_config, audio_config);

    const result = await new Promise((resolve, reject) => {
        recognizer.recognizeOnceAsync(model, result => { resolve(result); }, error => { reject(error); });
    });

    console.log("The most similar voice profile is: " + result.profileId + " with similarity score: " + result.score + ".\n");
}

// this will Not async IdentifySpeakerFromMic(profile, speech_config) { const audio_config = window.SpeechSDK.AudioConfig.fromDefaultMicrophoneInput(); const model = window.SpeechSDK.SpeakerIdentificationModel.fromProfiles([profile]); const recognizer = new window.SpeechSDK.SpeakerRecognizer(speech_config, audio_config);

    let attempt = 1;
    while (attempt++ < 10) {
        console.log("Speaker Identification Attempt: " + attempt);
        const result = await new Promise((resolve, reject) => {
            recognizer.recognizeOnceAsync(model,
                result => { resolve(result); }, error => { reject(error); });

// ---------------------------------------------------------------- // NOTE: recognizer.recognizeOnceAsync throws this error: "SyntaxError: Unexpected token N in JSON at position 0" // ----------------------------------------------------------------

        });
        console.log("Voice profile: " + result.profileId + " . Similarity score: " + result.score + ".\n");
    }

}
3. Recognize Speakers using IdentifySpeaker() : This works 
4. Recognize Speakers using IdentifySpeakerFromMic() : This doesn't

**Expected behavior**
Able to recognize Speakers when accepting input from  MIC, similar to how it can be done in C#:

public static async Task SpeakerIdentification(SpeechConfig config, List voiceProfiles, Dictionary<string, string> profileMapping) { var speakerRecognizer = new SpeakerRecognizer(config, AudioConfig.FromDefaultMicrophoneInput()); var model = SpeakerIdentificationModel.FromProfiles(voiceProfiles);

Console.WriteLine("Speak some text to identify who it is from your list of enrolled speakers.");
var result = await speakerRecognizer.RecognizeOnceAsync(model);
Console.WriteLine($"The most similar voice profile is {profileMapping[result.ProfileId]} with similarity score {result.Score}");

}


**Version of the Cognitive Services Speech SDK**
1.15.1 (js)

**Platform, Operating System, and Programming Language**
 - OS: Windows 10  
 - Hardware - x64
 - Programming language: JavaScript
 - Browser Chrome  

**Additional context**
The above throws this error: 
  "SyntaxError: Unexpected token N in JSON at position 0"

Here are more logs from the browser console 

Identifying /assets/dev/unknown_jt2.wav... SpeakerId.js:166 The most similar voice profile is: 7ddcef4e-cc00-452a-9823-be58c067e80c with similarity score: 0.6940312.

SpeakerId.js:110 Identifying /assets/dev/unknown_jill.wav... SpeakerId.js:166 The most similar voice profile is: 00000000-0000-0000-0000-000000000000 with similarity score: 0.

SpeakerId.js:146 Speaker Identification Attempt: 2 SpeakerId.js:127 Error: SyntaxError: Unexpected token N in JSON at position 0

amitkumarshukla commented 3 years ago

@glharper Could you please look at this.

glharper commented 3 years ago

@josephsctan Thanks for writing up this issue and using Speech SDK. Microphone input for Speaker Identification/Verification is currently not supported for JavaScript. We've added this to our backlog, and we will prioritize this against all future work.