Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.95k stars 1.86k forks source link

The bug of Timeout while synthesizing #2270

Closed Nietzche001 closed 8 months ago

Nietzche001 commented 8 months ago

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

loop through the proverbs text dir

def process_files_in_dir(directory, accent: str, out_dir:str): for filename in os.listdir(directory): filepath = os.path.join(directory, filename) voice = locale.get_random_voice(accent) print(f"the voice used is: {voice}") tts_txt_to_mp3(filepath, voice, out_dir)

Read text from input file

with open(input_file, "r", encoding="utf-8") as file:
    text = file.read()
# Synthesize speech
result = speech_synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text completed.")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

Describe the bug I was doing TTS using Azure speech API. Each text file is not large, generally 500 words converting within around 2 minute voice. I got 10~15 text files in total. During the conversion, I got 3~4 errors which are in the same type of Speech synthesis canceled: CancellationReason.Error. Error details: Timeout while synthesizing. Current RTF: 2.00423 (threshold 2), frame interval 3471ms (threshold 3000ms). USP state: ReceivingData. Received audio size: 15552 bytes. It seems the timeout is due to the RTF and frame interval threasholds. I did search the solution, but it seems there's no option to adjust these 2 threasholds. Everything in my code was very basic just using the the Azure demo code. I did not use concurrent mode and TTS conversion was done in serial mode, meaning the code was converting each text file one after another.

dargilco commented 8 months ago

@Nietzche001 thank you for reporting this! Please enable Speech SDK logs, do another run that shows the error, and attach the log to this GitHub issue.

@yulin-li @yinhew can you please follow up on this? Thanks!

Nietzche001 commented 8 months ago

I enabled the configuration of Speech SDK logs by speech_config.enable_audio_logging() The error message is: Error details: Connection was closed by the remote host. Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

pankopon commented 8 months ago

@Nietzche001 This is because of quotas and limits on Free (F0) tier https://learn.microsoft.com/azure/ai-services/speech-service/speech-services-quotas-and-limits#text-to-speech-quotas-and-limits-per-resource

Maximum number of transactions per time period for prebuilt neural voices and custom neural voices. 20 transactions per 60 seconds This limit isn't adjustable.

pankopon commented 8 months ago

Closed since based on the log issues are due to documented limitations on free subscriptions. Please open a new issue if more support is needed.

Raciel-c commented 8 months ago

I don't have 20 frequencies per minute, why do I keep reporting 4429 errors, Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

MichaelHazut commented 4 months ago

I don't have 20 frequencies per minute, why do I keep reporting 4429 errors, Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

Im now running into this problem did you managed to find a solution?