Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.62k stars 2.83k forks source link

Speechsdk audio output device change failed #36337

Closed pigz2538 closed 4 months ago

pigz2538 commented 4 months ago

Describe the bug I want to change the audio output device of the SDK, so I read the documentation and found that audio.AudioOutputConfig can set device_name(id).

图片

https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.audio.audiooutputconfig?view=azure-python

Then I obtained the devece_id of the audio output device through the C++ code in the webpage and input it into AudioOutputConfig, but it did not work.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices

And this code can only get one device:

图片

(I have changed the endpoints from eCapture to eRender of the C++ code to output the right device id. And I also try to use device name, it still doesn't work)

I want to know if I understand the document correctly and if the steps are correct. If the steps are indeed correct, then perhaps it is a bug? Thank you very much in any case.

To Reproduce Steps to reproduce the behavior:

Core code:

...

self.speech_config = speechsdk.SpeechConfig(subscription=self.speech_key, region=self.speech_region)
self.speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm)
self.audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=False, device_name="device_id")
# The output device starts like 8fea1c32-2386.....
self.speech_synthesizer = speechsdk.SpeechSynthesizer(self.speech_config, audio_config=self.audio_config)
speechresult = self.speech_synthesizer.speak_ssml_async(sentence).get()

...

Get device_id code from https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-select-audio-input-devices

...

hr = pEnumerator->EnumAudioEndpoints(eRender, DEVICE_STATE_ACTIVE, &pCollection);
// Change eCapture to eRender
...

Expected behavior The audio will be output in specific device.

Additional context I want to know if I understand the document correctly and if the steps are correct. If the steps are indeed correct, then perhaps it is a bug? Thank you very much in any case.

pigz2538 commented 4 months ago

I find more id at regedit: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render 图片