Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.2k stars 2.39k forks source link

Speech_recognition broken for new Bing API #385

Open brysonpayne opened 5 years ago

brysonpayne commented 5 years ago

Steps to reproduce

  1. Register for a Bing Speech API key
  2. Try to use sr.recognize_bing()

Expected behaviour

Speech recognition (speech to text)

Actual behaviour

Error - tried two different keys, multiple machines - Bing API has been updated, appears to no longer authenticate correctly with Speech_recognition 3.8.x - worked a few months ago - only thing I changed was getting a new API key.

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/speech_recognition/__init__.py", line 935, in recognize_bing
    credential_response = urlopen(credential_request, timeout=60)  # credential response can take longer, use longer timeout instead of default one
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Access Denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bpayne/Downloads/voice_control_mac_2018-11.py", line 22, in <module>
    MY_API_KEY,"en-US")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/speech_recognition/__init__.py", line 937, in recognize_bing
    raise RequestError("credential request failed: {}".format(e.reason))
speech_recognition.RequestError: credential request failed: Access Denied

System information

My system is Mac OS X 10.13.6 High Sierra, and Win 10 Pro/Enterprise - tested on 3 machines.

My Python version is 3.6.3 to 3.6.7, tested across 3 machines/versions. (You can check this by running python -V.)

My Pip version is 18.1

My SpeechRecognition library version is 3.8.1

My PyAudio library version is 0.2.11

jhoelzl commented 5 years ago

Same happens to me.

brysonpayne commented 5 years ago

I found the issue - as MS phases out the Bing Speech API in favor of Cognitive Speech Services, it's changing its servers. It was an issue in the Authentication and Recognition in the Bing section (Lines 1010 and 1040) of init.py. An updated copy of init.py is attached - just have to unzip and replace it in your C:\Users\{your_user_name}\AppData\Local\Programs\Python\Python36\Lib\site-packages\speech_recognition folder on a PC or /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/speech_recognition folder on Mac. Hope it's of help. Bryson init.zip

brysonpayne commented 5 years ago

I issued a pull request for init.py with the updated URLs for the newer Speech APIs. As soon as it's accepted in the main branch, this issue can be closed. Glad to contribute to a great library like Speech_Recognition.

040840308 commented 5 years ago

Yes, you are right. However, I found a question is that directly using your modified code may still fail. From the code, I see that you are in west US. But, I'm in East US. Thus, I failed too. I modified your code to East US. Success! So, other people should pay attention on this.

Thanks @brysonpayne

lastcoolnameleft commented 5 years ago

I've created a PR to add the new Azure Speech API (now that the Bing API is not working).

https://github.com/Uberi/speech_recognition/pull/389

If the Bing API definitely doesn't work, I can delete that code as well in the PR.

unchris commented 5 years ago

For anyone else who went down the rabbit hole to solve a bug in the second lab in Microsoft's AI Program (second lab of the first course), it still refers to the Bing Speech API and links to this speech_recognition library.

It looks like once #389 is merged, theoretically the line in MS's Jupyter notebook can be changed from transcription = r.recognize_bing(audio, key=speechKey) to transcription = r.recognize_azure(audio, key=speechKey)

Tylersuard commented 5 years ago

For anyone else who went down the rabbit hole to solve a bug in the second lab in Microsoft's AI Program (second lab of the first course), it still refers to the Bing Speech API and links to this speech_recognition library.

It looks like once #389 is merged, theoretically the line in MS's Jupyter notebook can be changed from transcription = r.recognize_bing(audio, key=speechKey) to transcription = r.recognize_azure(audio, key=speechKey)

You, sir, are the best kind of human. I came here for just such a thing.

bbrewington commented 4 years ago

I realized the answer to the below note was that this was resolved in Oct 2018 via https://github.com/Uberi/speech_recognition/commit/036a53c442b325e847df94854ae0eeafb7a6ed13 - when is this scheduled to be live on PyPI?

When I try using recognize_azure, I'm getting the following error: AttributeError: 'Recognizer' object has no attribute 'recognize_azure'

I noticed when I run the following, the other methods (e.g. recognize_google) show up, but not recognize_azure:

import speech_recognition as sr
import inspect
r = sr.Recognizer()
print(inspect.getmembers(r, predicate=inspect.ismethod))

Output:

[
('__enter__', <bound method AudioSource.__enter__ of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('__exit__', <bound method AudioSource.__exit__of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('__init__', <bound method Recognizer.__init__ of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('adjust_for_ambient_noise', <bound method Recognizer.adjust_for_ambient_noise of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('listen', <bound method Recognizer.listen of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('listen_in_background', <bound method Recognizer.listen_in_background of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_api', <bound method recognize_api of <class 'speech_recognition.Recognizer'>>), 
('recognize_bing', <bound method Recognizer.recognize_bing of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_google', <bound method Recognizer.recognize_google of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_google_cloud', <bound method Recognizer.recognize_google_cloud of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_houndify', <bound method Recognizer.recognize_houndify of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_ibm', <bound method Recognizer.recognize_ibm of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_sphinx', <bound method Recognizer.recognize_sphinx of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('recognize_wit', <bound method Recognizer.recognize_wit of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('record', <bound method Recognizer.record of <speech_recognition.Recognizer object at 0x10e65a5f8>>), 
('snowboy_wait_for_hot_word', <bound method Recognizer.snowboy_wait_for_hot_word of <speech_recognition.Recognizer object at 0x10e65a5f8>>)
]
carloscubur commented 4 years ago

I found the issue - as MS phases out the Bing Speech API in favor of Cognitive Speech Services, it's changing its servers. It was an issue in the Authentication and Recognition in the Bing section (Lines 1010 and 1040) of init.py. An updated copy of init.py is attached - just have to unzip and replace it in your C:\Users\{your_user_name}\AppData\Local\Programs\Python\Python36\Lib\site-packages\speech_recognition folder on a PC or /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/speech_recognition folder on Mac. Hope it's of help. Bryson init.zip

Esto me funciono !

CJSparrow commented 4 years ago

2020 same error

chrisspen commented 4 years ago

For anyone interested, I fixed this in my branch. As this repo seems to be abandoned, and my past PRs have been ignored, it's unlikely it will ever be fixed here. That said, I'm not using Bing anymore myself (as it's enormously expensive), but it's accuracy is fantastic.

rbrisita commented 3 years ago

The version should be bumped, tag a new release, and published to PyPI.

For now anyone finding this, install with VCS or tarball/zip:

xiuwei1026 commented 3 years ago

The version should be bumped, tag a new release, and published to PyPI.

For now anyone finding this, install with VCS or tarball/zip:

  • python -m pip install git+https://github.com/Uberi/speech_recognition.git
  • python -m pip install https://github.com/Uberi/speech_recognition/archive/master.zip

@rbrisita, hi, I followed your comments to reinstall the library but I still have the problem with azure. Any idea? Thanks.

r.recognize_azure Traceback (most recent call last):

File "", line 1, in r.recognize_azure

AttributeError: 'Recognizer' object has no attribute 'recognize_azure'

rbrisita commented 3 years ago

It seems that Python is still accessing an old version. Make sure you are working in the correct environment. Please review venv then try one of the install methods I mentioned before.

sandeshchand commented 2 years ago

I think the endpoint should be changed so, it is better to provide both key and endpoint url from the user side. The author has assigned the fixed endpoint which is not same (https://api.cognitive.microsoft.com/sts/v1.0/issuetoken) for every user.My problem was solved with little modification. def recognize_bing(self, audio_data, key,endpoint_url, language="en-US", show_all=False): """ Performs speech recognition on audio_data (an AudioData instance), using the Microsoft Bing Speech API.

    The Microsoft Bing Speech API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account <https://azure.microsoft.com/en-ca/pricing/details/cognitive-services/speech-api/>`__ with Microsoft Azure.

    To get the API key, go to the `Microsoft Azure Portal Resources <https://portal.azure.com/>`__ page, go to "All Resources" > "Add" > "See All" > Search "Bing Speech API > "Create", and fill in the form to make a "Bing Speech API" resource. On the resulting page (which is also accessible from the "All Resources" page in the Azure Portal), go to the "Show Access Keys" page, which will have two API keys, either of which can be used for the `key` parameter. Microsoft Bing Speech API keys are 32-character lowercase hexadecimal strings.

    The recognition language is determined by ``language``, a BCP-47 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#recognition-language>`__ under "Interactive and dictation mode".

    Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#sample-responses>`__ as a JSON dictionary.

    Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
    """
    assert isinstance(audio_data, AudioData), "Data must be audio data"
    assert isinstance(key, str), "``key`` must be a string"
    assert isinstance(language, str), "``language`` must be a string"

    access_token, expire_time = getattr(self, "bing_cached_access_token", None), getattr(self, "bing_cached_access_token_expiry", None)
    allow_caching = True
    try:
        from time import monotonic  # we need monotonic time to avoid being affected by system clock changes, but this is only available in Python 3.3+
    except ImportError:
        try:
            from monotonic import monotonic  # use time.monotonic backport for Python 2 if available (from https://pypi.python.org/pypi/monotonic)
        except (ImportError, RuntimeError):
            expire_time = None  # monotonic time not available, don't cache access tokens
            allow_caching = False  # don't allow caching, since monotonic time isn't available
    if expire_time is None or monotonic() > expire_time:  # caching not enabled, first credential request, or the access token from the previous one expired
        # get an access token using OAuth
        #credential_url = "https://api.cognitive.microsoft.com/sts/v1.0/issueToken"
        #credential_url = "https://westeurope.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
        credential_url = endpoint_url
        credential_request = Request(credential_url, data=b"", headers={
            "Content-type": "application/x-www-form-urlencoded",
            "Content-Length": "0",
            "Ocp-Apim-Subscription-Key": key,
        })

        if allow_caching:
            start_time = monotonic()

        try:
            credential_response = urlopen(credential_request, timeout=60)  # credential response can take longer, use longer timeout instead of default one
        except HTTPError as e:
            raise RequestError("credential request failed: {}".format(e.reason))
        except URLError as e:
            raise RequestError("credential connection failed: {}".format(e.reason))
        access_token = credential_response.read().decode("utf-8")

        if allow_caching:
            # save the token for the duration it is valid for
            self.bing_cached_access_token = access_token
            self.bing_cached_access_token_expiry = start_time + 600  # according to https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition, the token expires in exactly 10 minutes

    wav_data = audio_data.get_wav_data(
        convert_rate=16000,  # audio samples must be 8kHz or 16 kHz
        convert_width=2  # audio samples should be 16-bit
    )

    url = "https://speech.platform.bing.com/speech/recognition/interactive/cognitiveservices/v1?{}".format(urlencode({
        "language": language,
        "locale": language,
        "requestid": uuid.uuid4(),
    }))

    if sys.version_info >= (3, 6):  # chunked-transfer requests are only supported in the standard library as of Python 3.6+, use it if possible
        request = Request(url, data=io.BytesIO(wav_data), headers={
            "Authorization": "Bearer {}".format(access_token),
            "Content-type": "audio/wav; codec=\"audio/pcm\"; samplerate=16000",
            "Transfer-Encoding": "chunked",
        })
    else:  # fall back on manually formatting the POST body as a chunked request
        ascii_hex_data_length = "{:X}".format(len(wav_data)).encode("utf-8")
        chunked_transfer_encoding_data = ascii_hex_data_length + b"\r\n" + wav_data + b"\r\n0\r\n\r\n"
        request = Request(url, data=chunked_transfer_encoding_data, headers={
            "Authorization": "Bearer {}".format(access_token),
            "Content-type": "audio/wav; codec=\"audio/pcm\"; samplerate=16000",
            "Transfer-Encoding": "chunked",
        })

    try:
        response = urlopen(request, timeout=self.operation_timeout)
    except HTTPError as e:
        raise RequestError("recognition request failed: {}".format(e.reason))
    except URLError as e:
        raise RequestError("recognition connection failed: {}".format(e.reason))
    response_text = response.read().decode("utf-8")
    result = json.loads(response_text)

    # return results
    if show_all: return result
    if "RecognitionStatus" not in result or result["RecognitionStatus"] != "Success" or "DisplayText" not in result: raise UnknownValueError()
    return result["DisplayText"]
StephenArg commented 1 year ago

I found the issue - as MS phases out the Bing Speech API in favor of Cognitive Speech Services, it's changing its servers. It was an issue in the Authentication and Recognition in the Bing section (Lines 1010 and 1040) of init.py. An updated copy of init.py is attached - just have to unzip and replace it in your C:\Users\{your_user_name}\AppData\Local\Programs\Python\Python36\Lib\site-packages\speech_recognition folder on a PC or /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/speech_recognition folder on Mac. Hope it's of help. Bryson init.zip

Continuing an old thread, but this is unfortunately still an issue with using the SpeechRecognition package with recognize_bing(). I went into that init.py file and changed the url to the one you provided, except with eastus.apiinstead of westus.api. If azure requires a specific url dependent on server location to accompany the api key, they should add that as an argument to provide to the recognize_bing function so it can be interpolated into the string.