MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.31k stars 21.48k forks source link

How to set maximum speaker for real time diarization? #121815

Closed u7630991 closed 7 months ago

u7630991 commented 7 months ago

[Enter feedback here]

How to set maximum speaker for real time diarization?

Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

PesalaPavan commented 7 months ago

@u7630991 Thanks for your feedback! We will investigate and update as appropriate.

AjayBathini-MSFT commented 7 months ago

@u7630991

To set the maximum number of speakers for real-time diarization in the Microsoft Speech Service, you can use the SpeakerRecognitionConfig class in the Speech SDK. Specifically, you can set the NumberOfChannels property of the SpeakerRecognitionConfig object to the maximum number of speakers that you want to support.

Here is an example of how to set the maximum number of speakers to 2 in C#:

// Create a SpeakerRecognitionConfig object with the maximum number of speakers set to 2 var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion"); config.SpeechRecognitionLanguage = "en-US"; config.EnableDiarization = true; config.DiarizationNumberOfSpeakers = 2;

// Create a SpeechRecognizer object with the SpeakerRecognitionConfig using (var recognizer = new SpeechRecognizer(config)) { // Start recognition and wait for a result var result = await recognizer.RecognizeOnceAsync(); Console.WriteLine(result.Text); } In this example, the DiarizationNumberOfSpeakers property of the SpeakerRecognitionConfig object is set to 2, which means that the real-time diarization will attempt to identify up to 2 speakers in the audio stream.

u7630991 commented 7 months ago

@AjayBathini-MSFT Thank for your reply, but I can still get more than 2 speaker for real time diarization. I have set up the transcriber in python as below:

    self.speech_key = os.getenv("AZURE_SPEECH_KEY")
    self.speech_region = "australiaeast"
    self.speech_config = speechsdk.SpeechConfig(subscription=self.speech_key, region=self.speech_region)
    self.speech_config.speech_recognition_language = "en-US"

    self.speech_config.EnableDiarization = True
    self.speech_config.DiarizationNumberOfSpeakers = 2

    # Setup the stream for audio input
    self.stream = speechsdk.audio.PushAudioInputStream()
    self.audio_config = speechsdk.audio.AudioConfig(stream=self.stream)

    # Create a speech transcriber using the audio config
    self.transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=self.speech_config, audio_config=self.audio_config)
AjayBathini-MSFT commented 7 months ago

@u7630991 I'd recommend working closer with our support team via an [Azure support request] (https://docs.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request). We'll follow up there.