Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.25k stars 4.58k forks source link

Confusing naming and not working #45340

Open 1cuu7 opened 1 month ago

1cuu7 commented 1 month ago

Type of issue

Code doesn't work

Description

I've tried to set the audio output speaker using this and after running: var enumerator = new MMDeviceEnumerator(); foreach (var endpoint in enumerator.EnumerateAudioEndPoints(DataFlow.Capture, DeviceState.Active)) { Console.WriteLine("{0} ({1})", endpoint.FriendlyName, endpoint.ID); } to get the device name (endpoint.FriendlyName) of my Bluetooth speaker "Headphones (V8)". Whilst the speaker it defaults to (without specifying a config) which is aforementioned Bluetooth speaker works if I don't try to configure it, if I try to run this code in for configuring it manually:

_audioOutputConfig = AudioConfig.FromSpeakerOutput("Headphones(V8)"); _speechSynthesizer = new SpeechSynthesizer(speechConfig, _audioOutputConfig); No speech is produced from the speaker. Is this a bug? I've entered the device name enumerated above, and tried the device id (endpoint.ID), nothing works just silence when the speech synthesizer responds. The player instantiation that plays the ping sound comes through fine at the same time so I know the speaker is functioning and connected, but the FromSpeakerOutput call does not appear to function correctly.

It strikes me as odd as well as the documentation then provides a link note: "Specifies the device name. To retrieve platform-specific audio device names, see How to: Select an audio input device with the Speech SDK." Why is it specifying inputs, this is about outputs? Whilst my assumption is that enumerating devices returns both inputs and outputs so the code is a valid link, it increases the already slightly confusing functional intention of this code now with nomenclature and example vaguely appearing to state that this is to do with the input (eg. FromSpeaker / inputs example)

Why is it talking about an input device, and given the nature of the other functions calling it 'FromSpeakerOutput' when the documentation describes it as: "Creates an AudioConfig object that produces speech to to the specified speaker. Added in 1.14.0" Doesn't sound right, surely it should be "sends the audio/speech to a specified speaker" - "not produces to", and why is it not called ToSpeakerOutput? Also in the python SDK there's an AudioOutputConfig which is missing here and I wondered why it's not replicated in c# .NET.

Page URL

https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.audio.audioconfig.fromspeakeroutput?view=azure-dotnet#microsoft-cognitiveservices-speech-audio-audioconfig-fromspeakeroutput(system-string)

Content source URL

https://github.com/Azure/azure-docs-sdk-dotnet/blob/master/xml/Microsoft.CognitiveServices.Speech.Audio/AudioConfig.xml

Document Version Independent Id

7c9abf92-cb7c-5e11-2604-9b9b2cd4d464

Article author

@azure-sdk

Metadata

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.