MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.25k stars 21.42k forks source link

Useful Sample #69360

Closed warrenkc closed 3 years ago

warrenkc commented 3 years ago

Thank you very much for your detailed documentation. I would like to ask for an example to use the system audio or sometimes called loopback or stereo mix which is the sound output from the system. This way I could capture audio and transcribe it in real-time. For example in Zoom to transcribe Chinese audio in real-time.

Thank you.

Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

shashishailaj commented 3 years ago

@warrenkc Thank you for your feedback . We will investigate and update the thread.

ram-msft commented 3 years ago

@warrenkc Thanks for the feedback. Here are the samples for the speech audio, but not real time. Kiosk samples to test the cognitive service speech to text: https://github.com/microsoft/Cognitive-Samples-IntelligentKiosk/blob/master/Documentation/SpeechToTextExplorer.md

warrenkc commented 3 years ago

@ram-msft Thank you, however, I am not sure if this is what I am looking for. I would like to use the system audio (loopback capture) for the speech translation. I know how to use the microphone, but how can I use the system audio (loopback capture) for the speech to text or speech translation? Many users do not have this option as an input for their system.

If this is in the samples you provided, can you post the link to the code file? Thank you very much sir.

ram-msft commented 3 years ago

@warrenkc Thanks for the details. We have an example below shows using the batch transcription API how to pass an audio file, but this is not real-time. https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#sample-code https://medium.com/@abhishekcskumar/logic-apps-large-audio-speech-to-text-batch-transcription-d71e93bbaeec https://github.com/PanosPeriorellis/Speech_Service-BatchTranscriptionAPI/blob/master/CrisClient/Program.cs

We have a recommended approach(SoX is option) for converting of audio to different supported formats.

warrenkc commented 3 years ago

Thank you. However, I want to transcribe the speech from the real-time system audio (loopback capture). Example: Zoom meeting in Spanish. I need to be able to use the audio in real-time from the Zoom meeting in Spanish. This would be the audio coming out of the system speakers in real-time. Not afterwards.

ram-msft commented 3 years ago

@warrenkc Thanks for the details. One way you could do is "virtual" connect speaker to microphone, following VB-Audio Virtual Apps (vb-audio.com). After that, select the microphone linked with the speaker as the default microphone.

And then, you can run a SpeechRecognizer with input from a microphone, following the function RecognitionWithLanguageAndDetailedOutputAsync in the sample.

We have forwarded to the product team to support real-time for the system audio. You can also raise a user voice request here so the community can vote and provide their feedback, the product team then checks this feedback and implements the feature in future releases. We will now proceed to close this thread. If there are further questions regarding this matter, please tag me in your reply. We will gladly continue the discussion and we will reopen the issue.