Nietzche001 commented 8 months ago

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

Why I file it as a bug? I got the same error as this post: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2079 in which has no resolution but was suggested to file it as a bug if this reappears. So I did. The problem is quite often. I am in F0 pricing tier, not sure if this is only happending to free users. Anyway, this so troublesome and it should be fixed.
Speech SDK log taken from a run that exhibits the reported issue. Error details: Timeout while synthesizing. Current RTF: 0.542194 (threshold 2), frame interval 10034ms (threshold 3000ms). USP state: ReceivingData. Received audio size: 1550880 bytes. Did you set the speech resource key and region values?
A stripped down, simplified version of your source code that exhibits the issue.

I am providing 2 functional prototypes as a sample.

tts with different accents

def tts_txt_to_mp3(input_file: str, accent_voice: str, output_dir: str = "./mp3_proverbs/"):

Azure TTS configuration

region = "eastasia" # Replace with your Azure region subscription_key = os.environ.get("AZURE_SUBSCRIPTION_KEY") # Replace with your Azure subscription key audio_config = speechsdk.audio.AudioOutputConfig(filename = out_path_file)

Configurations

speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region) speech_config.speech_synthesis_voice_name = accent_voice speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary, value='true') speech_config.set_speech_synthesis_output_format( speechsdk.SpeechSynthesisOutputFormat.Audio16Khz64KBitRateMonoMp3 )
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

Read text from input file

with open(input_file, "r", encoding="utf-8") as file: text = file.read()

loop through the proverbs text dir

def process_files_in_dir(directory, accent: str, out_dir:str): for filename in os.listdir(directory): filepath = os.path.join(directory, filename) voice = locale.get_random_voice(accent) print(f"the voice used is: {voice}") tts_txt_to_mp3(filepath, voice, out_dir)

Read text from input file

with open(input_file, "r", encoding="utf-8") as file:
    text = file.read()
# Synthesize speech
result = speech_synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text completed.")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

Additional information as shown below

Describe the bug I was doing TTS using Azure speech API. Each text file is not large, generally 500 words converting within around 2 minute voice. I got 10~15 text files in total. During the conversion, I got 3~4 errors which are in the same type of Speech synthesis canceled: CancellationReason.Error. Error details: Timeout while synthesizing. Current RTF: 2.00423 (threshold 2), frame interval 3471ms (threshold 3000ms). USP state: ReceivingData. Received audio size: 15552 bytes. It seems the timeout is due to the RTF and frame interval threasholds. I did search the solution, but it seems there's no option to adjust these 2 threasholds. Everything in my code was very basic just using the the Azure demo code. I did not use concurrent mode and TTS conversion was done in serial mode, meaning the code was converting each text file one after another.

dargilco commented 8 months ago

@Nietzche001 thank you for reporting this! Please enable Speech SDK logs, do another run that shows the error, and attach the log to this GitHub issue.

@yulin-li @yinhew can you please follow up on this? Thanks!

Nietzche001 commented 8 months ago

I enabled the configuration of Speech SDK logs by speech_config.enable_audio_logging() The error message is: Error details: Connection was closed by the remote host. Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

pankopon commented 8 months ago

@Nietzche001 This is because of quotas and limits on Free (F0) tier https://learn.microsoft.com/azure/ai-services/speech-service/speech-services-quotas-and-limits#text-to-speech-quotas-and-limits-per-resource

Maximum number of transactions per time period for prebuilt neural voices and custom neural voices. 20 transactions per 60 seconds This limit isn't adjustable.

pankopon commented 8 months ago

Closed since based on the log issues are due to documented limitations on free subscriptions. Please open a new issue if more support is needed.

Raciel-c commented 8 months ago

I don't have 20 frequencies per minute, why do I keep reporting 4429 errors, Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

MichaelHazut commented 4 months ago

I don't have 20 frequencies per minute, why do I keep reporting 4429 errors, Error code: 4429. Error details: The request is throttled because you have exceeded the concurrent request limit allowed for your sub USP state: TurnStarted. Received audio size: 0 bytes.

Im now running into this problem did you managed to find a solution?

Azure-Samples / cognitive-services-speech-sdk

The bug of Timeout while synthesizing #2270

I am providing 2 functional prototypes as a sample.

tts with different accents

Azure TTS configuration

Configurations

Read text from input file

loop through the proverbs text dir

Read text from input file