azure-cognitiveservices-speech==1.32.1 just not work

albin3 commented 1 year ago

Describe the bug

just can't use sdk for version. azure-cognitiveservices-speech==1.32.1

following https://learn.microsoft.com/zh-cn/azure/ai-services/speech-service/how-to-speech-synthesis?tabs=browserjs%2Cterminal&pivots=programming-language-python

To Reproduce

Steps to reproduce the behavior:

docker in oxs , python3.8
pip install azure-cognitiveservices-speech
run

# -*- coding=utf-8 -*-
#!/usr/bin/env python

import os
import azure.cognitiveservices.speech as speechsdk

voice_list = '''
zh-CN-YunjianNeural
zh-CN-XiaoxuanNeural
zh-CN-XiaohanNeural
zh-CN-YunzeNeural
'''.split('\n')

voice_list = list(filter(
    lambda x: x != '',
    list(map(lambda y: y.strip(), voice_list))
))

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

# The language of the voice that speaks.
speech_config.speech_synthesis_voice_name='en-US-JennyNeural'

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

# Get text from the console and synthesize to the default speaker.
print("Enter some text that you want to speak >")
text = input()

speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = speech_synthesis_result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")

Expected behavior

A clear and concise description of what you expected to happen.

Version of the Cognitive Services Speech SDK

Which version of the SDK are you using.

Platform, Operating System, and Programming Language

OS: osx. 2.4 GHz 4 cores Intel Core i5
Hardware - x86_64 in docker (ubuntu) on osx. (MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports))
Programming language: Python3.8

Additional context

Error messages, stack trace, ...
Any additional information.

admin@fcc6395c0a87:~/video-maker$ /usr/local/bin/python /home/admin/video-maker/src/azure_speech.py
Traceback (most recent call last):
  File "/home/admin/video-maker/src/azure_speech.py", line 20, in <module>
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
  File "/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/speech.py", line 71, in __init__
    _call_hr_fn(fn=_sdk_lib.speech_config_from_subscription, *[ctypes.byref(handle), c_subscription, c_region])
  File "/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/interop.py", line 62, in _call_hr_fn
    _raise_if_failed(hr)
  File "/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/interop.py", line 55, in _raise_if_failed
    __try_get_error(_spx_handle(hr))
  File "/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/interop.py", line 50, in __try_get_error
    raise RuntimeError(message)
RuntimeError: Exception with error code: 
[CALL STACK BEGIN]

/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/libMicrosoft.CognitiveServices.Speech.core.so(+0x1e28f1) [0x7efeffb9c8f1]
/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/libMicrosoft.CognitiveServices.Speech.core.so(+0x165259) [0x7efeffb1f259]
/home/admin/.local/lib/python3.8/site-packages/azure/cognitiveservices/speech/libMicrosoft.CognitiveServices.Speech.core.so(speech_config_from_subscription_internal+0xe4) [0x7efeffa938a0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6d1d) [0x7eff0062dd1d]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6289) [0x7eff0062d289]
/usr/local/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x777) [0x7eff004d6f27]
/usr/local/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x8fb4) [0x7eff004cbfb4]
/usr/local/lib/libpython3.8.so.1.0(PyObject_Call+0x8e) [0x7eff00c8dd6e]
/usr/local/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x210a) [0x7eff00cea94a]
/usr/local/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8a2) [0x7eff00ce81d2]
/usr/local/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7eff00c8cafe]
/usr/local/lib/libpython3.8.so.1.0(PyVectorcall_Call+0x5d) [0x7eff00c8da9d]
/usr/local/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x210a) [0x7eff00cea94a]
/usr/local/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x2f9) [0x7eff00ce7c29]
/usr/local/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7eff00c8cafe]
/usr/local/lib/libpython3.8.so.1.0(_PyObject_FastCallDict+0x21d) [0x7eff00c8bf2d]
/usr/local/lib/libpython3.8.so.1.0(_PyObject_Call_Prepend+0x68) [0x7eff00c8d4f8]
[CALL STACK END]

Exception with an error code: 0x5 (SPXERR_INVALID_ARG)

albin3 commented 1 year ago

any response?

BrianMouncer commented 1 year ago

Did you set the environment variables "SPEECH_KEY" and "SPEECH_REGION" with your keys and region values?

You commented that you are using an ubuntu docker image on OSx, but most docker containers are not configured to have Speaker and Microphone input. Can you try modifying the AudioOutputConfig to use a file on disk, rather than the "default speaker". Or just use this premade sample, speech_synthesis_to_wave_file()

https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/04e4fbdee065978ea0bff09b6de26fe7b80cf2a7/samples/python/console#readme

albin3 commented 1 year ago

Hi @BrianMouncer , I have set the env value like this: