GoogleCloudPlatform / python-docs-samples

Code samples used on cloud.google.com
Apache License 2.0
7.45k stars 6.44k forks source link

Getting streaming too fast and too slow for same sample rate during testing #738

Closed chan71 closed 5 years ago

chan71 commented 7 years ago

In which file did you encounter the issue?

transcribe_streaming.py

Did you change the file? If so, how?

Yes, to use arecord instead of pyaudio/portaudio. You can find the modified file (transcribe_streaming_arecord.py) attached to 7th comment of #728

Describe the issue

When the script was run, it throws the following error on regular basis. We are testing this from NAO robot mic and NAOqi OS (distribution based on Gentoo OS).

(env) nao [err 1] ~/googlespeech $ stdout: Broken pipe
python transcribe_streaming_arecord.py
Recording raw data 'stdin' : Signed 16 bit Little Endian, Rate 14000 Hz, Mono
/var/persistent/home/nao/googlespeech/env/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:334: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  SNIMissingWarning
/var/persistent/home/nao/googlespeech/env/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:132: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecurePlatformWarning
I would like to make a cash deposit
Traceback (most recent call last):
  File "transcribe_streaming_arecord_20161222.py", line 224, in <module>
    main()
  File "transcribe_streaming_arecord_20161222.py", line 215, in main
    listen_print_loop(recognize_stream)
  File "transcribe_streaming_arecord_20161222.py", line 164, in listen_print_loop
    raise RuntimeError('Server error: ' + resp.error.message)
RuntimeError: Server error: Audio data is being streamed too fast. Please stream audio data approximately at real time.

Mic is identified properly as seen in the following command.

(env) nao [0] ~/googlespeech $ arecord --list-devices
**** List of CAPTURE Hardware Devices ****
card 0: MID [HDA Intel MID], device 0: AD198x Analog [AD198x Analog]
 Subdevices: 0/2
 Subdevice #0: subdevice #0
 Subdevice #1: subdevice #1

And sound driver supports sampling rates exceeding 48k.

(env) nao [err 1] ~/googlespeech $ pactl list short sinks
0       alsa_output.0.output-speakers   module-alsa-card.c      s16le 2ch 48000Hz       SUSPENDED

We have also observed that google server has complained about streaming too fast or slow even for same rate (e.g. 16k) at different times. Only difference between these tests was that the network latency kept changing and was high in general. Can the above error be caused by unstable network bandwidth? Is there any solution or workaround to use streaming under a bad network condition. Can any other factors cause this error?

jerjou commented 7 years ago

From the looks of your code, it looks like you're reading in all the audio to transcribe first, and only after you're finished are you sending the data to the api. Thus, the API is getting the entire audio chunk at once, instead of in realtime. You'll want to do something more like this:

def record_audio(rate, chunk):
    buff = queue.Queue()

    reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
    p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)

    t = threading.Thread(target=lambda: buff.put(bytearray(p.stdout.read(1024))))
    t.start()

    yield _audio_data_generator(buff)

    p.kill()
    t.join()
    # Signal the _audio_data_generator to finish
    buff.put(None)

ie spin up a thread that fills the buffer as it comes in. Though I would still recommend using the pyalsaaudio package instead of shelling out to arecord, if possible :-)

chan71 commented 7 years ago

@jerjou thanks for the suggestion. We tried out this approach during last could days. We could not get the streaming too fast issue sorted out on NAO robot though. Same code is working fine on my Ubuntu 16.04 without giving this error.

The difference between Ubuntu 16.04 and NAO OS (based on Gentoo distribution) is that

Is there any specific issues with python 2.7.3?

We changed buffer size from 1024 to 1600 and 3200 with a sleep ranging from 0.01 to 0.1 to see we could get rid of this error. We got the error to subside when buffer = 3200 and sleep = 0.1. How we arrive at this buffer size = 3200 and sleep = 0.1 s is as follows.

Audio recording parameters in arecord are, Rate = 16000 samples per second Depth of sample = 16 bits

As we have to send samples every 100 ms to match optimum processing from google speech service, 16000 / 10 samples should be sent in every 100 ms. As each sample contains 16 bits, it is (16000 * 16 ) / 10 = 256000 bits = 3200 byes.

So we set the buffer to 3200 bytes and read it every 100 ms to send to google speech.

The findings are as follows.

image

From above observations, we could get rod of the error with buffer size 3200 and sleep 0.1 but it took 5-9 seconds to get the transcript. Is there any reason for that much of delay?

See record_audio() method and ReadAudioThread class.

def record_audio(rate, chunk):
    """Opens a recording stream in a context manager."""
    # Create a thread-safe buffer of audio data
    buff = queue.Queue()

    print "[record_audio] about to start recording" 
    reccmd = ["arecord", "-f", "S16_LE", "-r", str(RATE), "-t", "raw"]
    p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
    print "[record_audio] recording in progress"

    t = ReadAudioThread(buff, p)
    t.start()

    yield _audio_data_generator(buff, p)

    # Signal the _audio_data_generator to finish
    buff.put(None)
    p.kill()

class ReadAudioThread (threading.Thread):
    def __init__(self, buff, p):
        threading.Thread.__init__(self)
        self.p = p
        self.buff = buff

    def run(self):
        print("[ReadAudioThread] inside read audio thread")
        while True:
            data = self.p.stdout.read(3200)
            self.buff.put(data)
            sleep(0.1)

Full file attached.. trans_streaming_1.txt

chan71 commented 7 years ago

@jerjou is there any update on this? This is a blocker issue for us in our development. You input is much appreciated.

jerjou commented 7 years ago

I'm able to reproduce the issue by introducing network latency on my test machine. Does this happen when the network connection is reliable as well, or is it always patchy on the Nao?

I also notice from the output in your initial comment that asound is actually recording at 14000 Hz. It's possible the Nao sound card doesn't support 16000? Did you adjust the RATE constant in the script to compensate?

In general I'd advise against adding sleeps - the error you're getting indicates the rate you're getting data from your sound card is different from the rate the api is getting it

jerjou commented 7 years ago

Oops - clicked 'Comment' before I was done with the thought.

So, the sample was written in a way that, if you sleep, the audio data will continue to buffer, and just send it all at once in the next request. If the rate of the microphone generating data, and the rate that the api expects data, match up, everything should work out fine.

Honestly, I'm not sure why you're not still getting "too slow" errors from the API, if asound is still recording at 14kHz. I suspect that somehow, the subprocess is getting extra data in stdout, so it is able to read the requested number of bytes immediately... and then the buffer size + sleep artificially cap the data rate to 16kHz.

I hypothesize that the reason you're getting the delay in transcript is because the audio is being interpreted at a different sample rate than it's being recorded in. For example, have you ever tried playing an audio file at a different sample rate than it's been recorded? It's still interpretable, but it's distorted and sounds weird :-)

Anyway, just some guesses. Again, I'd recommend using pyalsaaudio instead of shelling out to asound, which introduces extra complexity that might be a contributing factor.

chan71 commented 7 years ago

@jerjou thanks for the reply.

NAO does support 16 kHz. What I added to initial comment was one test we did with changing rate to 14 kHz.

I accept that network is bit slower/unstable where NAO is tested. I will try to using pyalsaaudio and let you know the results.

I've seen that users have to put 100 ms of audio every 100 ms in the streaming channel. If a 100 ms audio packet gets delayed to reach GCP at some point, can it cause an issue? Looks like this is the issue you have recreated by adding network latency. If this is the issue, how can we make sure that every 100 ms packet reaches GCP at a frequency of 100 ms? Isn't this too much to expect from a slow network/bad connection?

jerjou commented 7 years ago

(FYI I agree with you, and am investigating things on the server end - might be a bug on our side. Will update when I find out more)

chan71 commented 7 years ago

I have modified the code to use pyalsaaudio as used here. We will try it out on NAO Robot tomorrow and let you know how it goes. Attached is the modified code. transcribe_streaming_alsa.py.txt

chan71 commented 7 years ago

@jerjou do you have any update on server side issues related to this? I tried pyalsaaudiosample in opennao vm but it failed to emrgethe pyalsa module as downloaded file size did not match with checksum. I will try again today.

Resolving pypi.python.org... 151.101.192.223, 151.101.128.223, 151.101.64.223, ...
Connecting to pypi.python.org|151.101.192.223|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://pypi.python.org/packages/source/p/pyalsaaudio/pyalsaaudio-0.6.tar.gz [following]
--2017-01-09 11:42:35--  https://pypi.python.org/packages/source/p/pyalsaaudio/pyalsaaudio-0.6.tar.gz
Connecting to pypi.python.org|151.101.192.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75154 (73K) [application/octet-stream]
Saving to: `/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz'

100%[======================================>] 75,154       115K/s   in 0.6s

2017-01-09 11:42:37 (115 KB/s) - `/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz' saved [75154/75154]

!!! Fetched file: pyalsaaudio-0.6.tar.gz VERIFY FAILED!
!!! Reason: Filesize does not match recorded size
!!! Got:      75154
!!! Expected: 75155
Refetching... File renamed to '/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz._checksum_failure_.T171b7'
jerjou commented 7 years ago

Turns out I was wrong about the bug I thought I saw. Still investigating..

Instead of installing the python packages globally on your system, I'd recommend installing it in a virtualenv - that way you can be certain you've got all the right versions, without conflicting with any packages already installed on the system. Then you should just be able to pip install pyalsaaudio.

chan71 commented 7 years ago

@jerjou Yes, I have installed the packages inside a virtual environment as read me file suggests. However, it threw that checksum issue. I'm working on it to get it corrected. I just got the same package downloaded in one of my development machines which has opennao vm without an error. Hope to push it to nao from a fresh installation of opennao vm.

Is there any update from the back end guys? I checked with google support (esupport@google.com) on the same and got the following response pointing to this forum.

Subject: [#11708281] [Trial] Pre Trial Customer Inquiry [ ref:_00D00VNwG._5006013fvci:ref ] Thank you for your message. Possible that this issue something that we need to fix on our end however this feature is still on beta status and not subject for SLA. However I would like to suggest that you file a new issue on github for the sample code that you are following.

You may check out this link for the related issue and you may file a new one to address your concern since this is being monitored by the engineers who develop the code.

jerjou commented 7 years ago

Yeah - they're looking into it; but keep in mind they're juggling other priorities (and it wasn't the obvious bug). I'll update here when I hear more.

jerjou commented 7 years ago

Okay - they pushed a fix. Try again, and let me know if you're still hitting this.

chan71 commented 7 years ago

Thanks a lot for following this up. I will check and confirm. As I have migrated to Australia, it will take sometime to confirm though. Meantime, if anyone else can confirm if this is fixed, it would be really good.