Docs to Transcribe Streaming Audio from Microphone and Performing Speech Recognition for Speech v2 API

rabiaedayilmaz commented 3 months ago

I searched all over the internet but all I could find people that have same problems with me. Recently, speech v2 is released and there sample codes for various tasks. The most relevant sample is streaming speech recognition on a local file.

Whenever I try to implement for microphone, like we did in speech_v1p1beta1, an error occurs. The last error I stuck on is: Google Speech Error: 400 Audio chunk can be of a a maximum of 25600 bytes. Received audio of 253952 bytes instead.

I assume it occurs because I can not define and split into chunk size for incoming microphone audio.

There is a need for Streaming Audio from Microphone and Performing Speech Recognition for Speech v2 API sample code in docs.

SchulerSimon commented 3 months ago

I have the same issue with speech-to-text-v2. I'll try to provide a bit more context:

I have multiple IoT-Devices at different places. Some work, some don't. I have no Idea why, or what's the difference. Software and Hardware are the same on all devices.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 173, in error_remapped_callable
    return _StreamingResponseIterator(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 95, in __init__
    self._stored_first_result = next(self._wrapped)
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 540, in __next__
    return self._next()
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 966, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:<REDACTED> {created_time:"2024-04-03T13:04:32.515940442+02:00", grpc_status:3, grpc_message:"Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead."}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "speech_2_text.py", line 153, in run
    self.responses = self.client.streaming_recognize(
  File "/usr/local/lib/python3.10/dist-packages/google/cloud/speech_v2/services/speech/client.py", line 1884, in streaming_recognize
    response = rpc(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 372, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 207, in retry_target
    result = target()
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 177, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead.

Note: I removed the IPv6 from the error-message.

pip3 freeze | grep google:

google-api-core==2.15.0
google-auth==2.25.2
google-cloud-speech==2.25.1
google-cloud-texttospeech==2.15.0
googleapis-common-protos==1.62.0

I happened to have this same problem with google-cloud-speech==2.23.0 as well.

As by the examples, I feed audio-data via

def generator(self):
        """acts as a blocking generator for buffered audio_data
        when no data is there, the generator blocks till there is new data

        this generator uses queue.Queue, thus it is thread-safe

        Yields:
            bytes: the buffered audio
        """
        while not self.closed:
            # use blocking get
            chunk = self._buff.get()
            # return when stop signal detected (None)
            if chunk is None:
                return
            data = [chunk]

            # consume the rest of the queue
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            # yield result
            yield b"".join(data)

SchulerSimon commented 3 months ago

The Documentation here states, that 25 KB is the maximum.

I attempted a fix:

            # yield result 
            bytes_chunk = b"".join(data)
            for chunk in [bytes_chunk[x:x+25600] for x in range(0, len(bytes_chunk), 25600)]:
                yield chunk

Does get rid of this exact error, but then we just get another error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 173, in error_remapped_callable
    return _StreamingResponseIterator(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 95, in __init__
    self._stored_first_result = next(self._wrapped)
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 540, in __next__
    return self._next()
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 966, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.CANCELLED
        details = "The operation was cancelled."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:<REDACTED> {created_time:"2024-04-04T10:01:14.580325845+02:00", grpc_status:1, grpc_message:"The operation was cancelled."}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "speech_2_text.py", line 155, in run
    self.responses = self.client.streaming_recognize(
  File "/usr/local/lib/python3.10/dist-packages/google/cloud/speech_v2/services/speech/client.py", line 1884, in streaming_recognize
    response = rpc(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 372, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 207, in retry_target
    result = target()
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 177, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.Cancelled: 499 The operation was cancelled.

Note: I removed the IPv6 from the error-message.

GoogleCloudPlatform / python-docs-samples

Docs to Transcribe Streaming Audio from Microphone and Performing Speech Recognition for Speech v2 API #11389