Closed yashugupta786 closed 3 years ago
If you haven't noticed, Teams Client has the transcription feature built in now. :-)
We do not have a Python API surface for our Conversational Transcription Service yet, or the Speaker ID API (but we are working on them). You could do your experiments in C# though.
Keep in mind that the conversation transcription APIs are in preview, and will undergo some refactoring as we take customer feedback into the design.
If your question was actually about how to do the audio input when you don't have a file or a microphone as input.... You would usually get the audio from the call/meeting as a stream, and them put that stream into a push or pull audio stream class when passing it to the speech SDK.
Thanks for the response Brian .Teams have the Transcription functionality . I am looking for Translation Functionality so that we can Transcribe and translate the conference meetings into different language .However on Teams only Transcription is available and its for only English language (Speech to text in English only). so according to you like we cannot fetch the audio of multiple speakers (in a meeting) from teams and pass to AZURE speech translation.
Any other work around in python . As i have observed when using python speech recognition library i am able to capture the audio of all speakers/users but the accuracy is very bad .If any solution in python how i can capture the audio for all users/speakers in a meeting using the azure service it would be great
@yashugupta786 Sorry about the lack of response - is this issue still valid for you? Unfortunately there is probably not much to add to what has been written so far.
In general, if you have a direct access to an audio stream from a source other than the microphone or a file, the recommended approach currently is to use a push audio stream to feed audio data from the source stream to the Speech SDK. If there are several such source streams that you want to be processed simultaneously then you need to mix their audio together before the Speech SDK.
Closed as answered. Please create a new issue if you need further support on any specific topic.
I have a created a small application for continuous speech to text Transcription and translation .For a single user getting input from microphone is working fine . But when we have multiple speaker's(skype meeting ,teams conference call, zoom meeting or any audio source ) how to fetch audio for all speakers and pass to azure speech to text service . As of now only options are microphone, Audio File
How to Achieve this in python so that multiple speakers voice can be feed to Azure speech services and can transcribe or Translate them