Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.89k stars 1.85k forks source link

Keyword recognition in Python: Advanced models throw SPXERR_INVALID_ARG #2571

Closed S1lverhand closed 2 weeks ago

S1lverhand commented 2 months ago

I am doing keyword recognition in Python 3.9 with azure-cognitiveservices-speech=1.40.0 using PyCharm 2024.1.1 (Professional Edition) on a Windows 11 Pro machine. The following code works for basic models as expected, but throws SPXERR_INVALID_ARG for advanced models (lowfa, midfa and highfa). All models have been trained on the same day, 30th August.

My code:

import azure.cognitiveservices.speech as speechsdk

key = MY_KEY  # private
region = 'westeurope'

speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "speech_sdk.log")

recognizer_input_stream = speechsdk.audio.PushAudioInputStream()
audio_config = speechsdk.audio.AudioConfig(stream=recognizer_input_stream)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# keyword_model = "basic.table"  # good
keyword_model = "advanced_midfa.table"  # throws SPXERR_INVALID_ARG

kw_model = speechsdk.KeywordRecognitionModel(keyword_model)
recognizer.start_keyword_recognition(kw_model)

Stacktrace:

D:\test_project\.venv\Scripts\python.exe D:\test_project\test.py 
Traceback (most recent call last):
  File "D:\test_project\test.py", line 18, in <module>
    recognizer.start_keyword_recognition(kw_model)
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 821, in start_keyword_recognition
    return self.start_keyword_recognition_async(model).get()
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 576, in get
    result_handle = self.__get_function(self._handle)
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 1111, in resolve_future
    _call_hr_fn(fn=_sdk_lib.recognizer_start_keyword_recognition_async_wait_for, *[handle, max_uint32])
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\interop.py", line 62, in _call_hr_fn
    _raise_if_failed(hr)
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\interop.py", line 55, in _raise_if_failed
    __try_get_error(_spx_handle(hr))
  File "D:\test_project\.venv\lib\site-packages\azure\cognitiveservices\speech\interop.py", line 50, in __try_get_error
    raise RuntimeError(message)
RuntimeError: Exception with error code: 
[CALL STACK BEGIN]

    > keyword_spotter_initialize
    - keyword_spotter_initialize
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_string_to_wstring
    - pal_get_value
    - pal_get_value
    - pal_get_value
    - pal_get_value
    - pal_get_value

[CALL STACK END]

Exception with an error code: 0x5 (SPXERR_INVALID_ARG)

I have attached the log file. speech_sdk.log

Using a dedicated KeywordRecognizer as in the official example (which by the way raises NotImplementedError for save_to_wav_file_async in line 764) also does not work for advanced models. It prints CANCELED: CancellationReason.Error.

pankopon commented 1 month ago

Hi, this is because of a missing library file in the Speech SDK Python packages. We will fix it in the next Speech SDK 1.41.0 release due in October. Before that, please try the following as a workaround:

  1. Check where azure-cognitiveservices-speech is installed, like

    C:\>python
    Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import azure.cognitiveservices.speech as speechsdk
    >>> print(speechsdk.__file__)
    C:\Apps\WPy64-31241\python-3.12.4.amd64\Lib\site-packages\azure\cognitiveservices\speech\__init__.py

    -> in this example, C:\Apps\WPy64-31241\python-3.12.4.amd64\Lib\site-packages\azure\cognitiveservices\speech\ is the location of the installed module.

  2. Download the Speech SDK nuget package of the same version from https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/1.40.0

  3. Unzip the downloaded microsoft.cognitiveservices.speech.1.40.0.nupkg (it's a zip compressed archive)

  4. Go to the extracted runtimes\win-x64\native folder and copy Microsoft.CognitiveServices.Speech.extension.kws.ort.dll to the Python module location shown above.

Then the advanced keyword models will work.

github-actions[bot] commented 4 weeks ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

pankopon commented 2 weeks ago

Fixed in the Speech SDK 1.41.1 release.