Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.95k stars 1.86k forks source link

speak_text_async() is not async #2383

Open FDUS105301 opened 6 months ago

FDUS105301 commented 6 months ago

I have this line of code

result = speech_synthesizer.speak_text_async(text).get()

which is supposed to be async. I have created a callback handler to intercept the streamed data in chunks like so

`class MyPushAudioOutputStream(speechsdk.audio.PushAudioOutputStreamCallback): def init(self, frame_rate, sample_width, channels): super().init() self.frame_rate = frame_rate self.sample_width = sample_width self.channels = channels self.audio_data = bytearray() self.chunk_count = 0

def write(self, audio_buffer):
    audio_data = bytes(audio_buffer)
    self.audio_data.extend(audio_data)

    # Add WAV headers to the audio data
    wav_data_with_header = self.add_wav_header(audio_data)
    encoded_audio_data = base64.b64encode(wav_data_with_header).decode('utf-8')
    socketio.emit('audio_stream', {'audio_data': encoded_audio_data})

    return len(audio_data)`

I am able to get the data right as its coming in. emitting the daat through sockets is not working untill synthesis is complete (speak_text_async). It is blocking the emit behavior of sockets. However the function specifically has a comment description saying "Performs synthesis on plain text in a non-blocking (asynchronous) mode." which is clearly not the case

Here is my main code

`@socketio.on('synthesize_text') def handle_synthesize_text(data): start_time = time.time() text = data['text'] print('Speaking!!!')

# Define audio parameters
frame_rate = 16000  # Example frame rate, set this to your desired frame rate
sample_width = 2    # 2 bytes for 16-bit audio
channels = 1        # Mono audio

# Create a push audio output stream with the callback
push_stream_callback = MyPushAudioOutputStream(frame_rate, sample_width, channels)
audio_output_stream = speechsdk.audio.PushAudioOutputStream(push_stream_callback)
audio_output_config = speechsdk.audio.AudioOutputConfig(stream=audio_output_stream)

# Create a speech synthesizer with the push audio output stream
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config)

# Emit the initial WAV header to the client
num_channels = channels
sample_rate = frame_rate
byte_rate = sample_rate * num_channels * sample_width
block_align = num_channels * sample_width
bits_per_sample = sample_width * 8
subchunk2_size = 0
chunk_size = 36 + subchunk2_size

wav_header = struct.pack('<4sI4s4sIHHIIHH4sI',
                         b'RIFF',
                         chunk_size,
                         b'WAVE',
                         b'fmt ',
                         16,
                         1,
                         num_channels,
                         sample_rate,
                         byte_rate,
                         block_align,
                         bits_per_sample,
                         b'data',
                         subchunk2_size)
encoded_wav_header = base64.b64encode(wav_header).decode('utf-8')
socketio.emit('audio_stream', {'audio_data': encoded_wav_header})

result = speech_synthesizer.speak_text_async(text).get()
print('Done Speaking!!!')
end_time = time.time()

# Print synthesis time for debugging
print(f"Synthesis time: {end_time - start_time} seconds")

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Synthesis completed.")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        # Emit detailed error information to the client
        socketio.emit('synthesis_error', {'message': 'Speech synthesis canceled due to an error.',
                                          'details': cancellation_details.error_details})
    else:
        socketio.emit('synthesis_error', {'message': 'Speech synthesis was canceled.'})
else:
    socketio.emit('synthesis_error', {'message': 'Failed to synthesize speech for unknown reasons.'})`

I am not able to stream to my socket in realtime because of this error.

Version of the Cognitive Services Speech SDK: 1.37.0

Which version of the SDK are you using: Latest.

Platform, Operating System, and Programming Language

Additional information: I'm using sockets in async mode and monkey patching with gevent. The code is hosted on an azure web app (Linux) basic b1 tier.

pankopon commented 5 months ago

@yulin-li Can you please check.

hmthang96 commented 5 months ago

Here is my solution to avoid blocking flow when using .get method from azure python

def your_method_is_blocked_util_get_result(your params,...): .... result = speech_synthesizer.speak_text_async(text).get()

....

async def wrap_async_method: result_method = asyncio.to_thread( your_method_is_blocked_util_get_result , your params) await result_method


your main flow

await wrap_async_method(param....)

FDUS105301 commented 5 months ago

Here is my solution to avoid blocking flow when using .get method from azure python

def your_method_is_blocked_util_get_result(your params,...): .... result = speech_synthesizer.speak_text_async(text).get()

....

async def wrap_async_method: result_method = asyncio.to_thread( your_method_is_blocked_util_get_result , your params) await result_method

your main flow

await wrap_async_method(param....)

Let me try this solution. Am i correct in assuming that this is a workaround more than a full on solution?

yulin-li commented 5 months ago

It's true that the speak_text_async() API is not an asyncio API in Python. Actually the Speech SDK doesn't support asyncio now.

hmthang96 commented 5 months ago

Here is my solution to avoid blocking flow when using .get method from azure python def your_method_is_blocked_util_get_result(your params,...): .... result = speech_synthesizer.speak_text_async(text).get() .... async def wrap_async_method: result_method = asyncio.to_thread( your_method_is_blocked_util_get_result , your params) await result_method your main flow await wrap_async_method(param....)

Let me try this solution. Am i correct in assuming that this is a workaround more than a full on solution?

yes. azure async is not an async API. So you should put this blocking to another thread, while your main runs on main thread.