Closed chan71 closed 5 years ago
From the looks of your code, it looks like you're reading in all the audio to transcribe first, and only after you're finished are you sending the data to the api. Thus, the API is getting the entire audio chunk at once, instead of in realtime. You'll want to do something more like this:
def record_audio(rate, chunk):
buff = queue.Queue()
reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
t = threading.Thread(target=lambda: buff.put(bytearray(p.stdout.read(1024))))
t.start()
yield _audio_data_generator(buff)
p.kill()
t.join()
# Signal the _audio_data_generator to finish
buff.put(None)
ie spin up a thread that fills the buffer as it comes in.
Though I would still recommend using the pyalsaaudio package instead of shelling out to arecord
, if possible :-)
@jerjou thanks for the suggestion. We tried out this approach during last could days. We could not get the streaming too fast issue sorted out on NAO robot though. Same code is working fine on my Ubuntu 16.04 without giving this error.
The difference between Ubuntu 16.04 and NAO OS (based on Gentoo distribution) is that
Is there any specific issues with python 2.7.3?
We changed buffer size from 1024 to 1600 and 3200 with a sleep ranging from 0.01 to 0.1 to see we could get rid of this error. We got the error to subside when buffer = 3200 and sleep = 0.1. How we arrive at this buffer size = 3200 and sleep = 0.1 s is as follows.
Audio recording parameters in arecord are, Rate = 16000 samples per second Depth of sample = 16 bits
As we have to send samples every 100 ms to match optimum processing from google speech service, 16000 / 10 samples should be sent in every 100 ms. As each sample contains 16 bits, it is (16000 * 16 ) / 10 = 256000 bits = 3200 byes.
So we set the buffer to 3200 bytes and read it every 100 ms to send to google speech.
The findings are as follows.
From above observations, we could get rod of the error with buffer size 3200 and sleep 0.1 but it took 5-9 seconds to get the transcript. Is there any reason for that much of delay?
See record_audio() method and ReadAudioThread class.
def record_audio(rate, chunk):
"""Opens a recording stream in a context manager."""
# Create a thread-safe buffer of audio data
buff = queue.Queue()
print "[record_audio] about to start recording"
reccmd = ["arecord", "-f", "S16_LE", "-r", str(RATE), "-t", "raw"]
p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
print "[record_audio] recording in progress"
t = ReadAudioThread(buff, p)
t.start()
yield _audio_data_generator(buff, p)
# Signal the _audio_data_generator to finish
buff.put(None)
p.kill()
class ReadAudioThread (threading.Thread):
def __init__(self, buff, p):
threading.Thread.__init__(self)
self.p = p
self.buff = buff
def run(self):
print("[ReadAudioThread] inside read audio thread")
while True:
data = self.p.stdout.read(3200)
self.buff.put(data)
sleep(0.1)
Full file attached.. trans_streaming_1.txt
@jerjou is there any update on this? This is a blocker issue for us in our development. You input is much appreciated.
I'm able to reproduce the issue by introducing network latency on my test machine. Does this happen when the network connection is reliable as well, or is it always patchy on the Nao?
I also notice from the output in your initial comment that asound
is actually recording at 14000 Hz. It's possible the Nao sound card doesn't support 16000? Did you adjust the RATE
constant in the script to compensate?
In general I'd advise against adding sleeps - the error you're getting indicates the rate you're getting data from your sound card is different from the rate the api is getting it
Oops - clicked 'Comment' before I was done with the thought.
So, the sample was written in a way that, if you sleep, the audio data will continue to buffer, and just send it all at once in the next request. If the rate of the microphone generating data, and the rate that the api expects data, match up, everything should work out fine.
Honestly, I'm not sure why you're not still getting "too slow" errors from the API, if asound
is still recording at 14kHz. I suspect that somehow, the subprocess is getting extra data in stdout, so it is able to read the requested number of bytes immediately... and then the buffer size + sleep artificially cap the data rate to 16kHz.
I hypothesize that the reason you're getting the delay in transcript is because the audio is being interpreted at a different sample rate than it's being recorded in. For example, have you ever tried playing an audio file at a different sample rate than it's been recorded? It's still interpretable, but it's distorted and sounds weird :-)
Anyway, just some guesses. Again, I'd recommend using pyalsaaudio instead of shelling out to asound
, which introduces extra complexity that might be a contributing factor.
@jerjou thanks for the reply.
NAO does support 16 kHz. What I added to initial comment was one test we did with changing rate to 14 kHz.
I accept that network is bit slower/unstable where NAO is tested. I will try to using pyalsaaudio
and let you know the results.
I've seen that users have to put 100 ms of audio every 100 ms in the streaming channel. If a 100 ms audio packet gets delayed to reach GCP at some point, can it cause an issue? Looks like this is the issue you have recreated by adding network latency. If this is the issue, how can we make sure that every 100 ms packet reaches GCP at a frequency of 100 ms? Isn't this too much to expect from a slow network/bad connection?
(FYI I agree with you, and am investigating things on the server end - might be a bug on our side. Will update when I find out more)
I have modified the code to use pyalsaaudio
as used here. We will try it out on NAO Robot tomorrow and let you know how it goes. Attached is the modified code.
transcribe_streaming_alsa.py.txt
@jerjou do you have any update on server side issues related to this?
I tried pyalsaaudio
sample in opennao vm but it failed to emrge
the pyalsa module as downloaded file size did not match with checksum. I will try again today.
Resolving pypi.python.org... 151.101.192.223, 151.101.128.223, 151.101.64.223, ...
Connecting to pypi.python.org|151.101.192.223|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://pypi.python.org/packages/source/p/pyalsaaudio/pyalsaaudio-0.6.tar.gz [following]
--2017-01-09 11:42:35-- https://pypi.python.org/packages/source/p/pyalsaaudio/pyalsaaudio-0.6.tar.gz
Connecting to pypi.python.org|151.101.192.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75154 (73K) [application/octet-stream]
Saving to: `/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz'
100%[======================================>] 75,154 115K/s in 0.6s
2017-01-09 11:42:37 (115 KB/s) - `/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz' saved [75154/75154]
!!! Fetched file: pyalsaaudio-0.6.tar.gz VERIFY FAILED!
!!! Reason: Filesize does not match recorded size
!!! Got: 75154
!!! Expected: 75155
Refetching... File renamed to '/usr/portage/distfiles/pyalsaaudio-0.6.tar.gz._checksum_failure_.T171b7'
Turns out I was wrong about the bug I thought I saw. Still investigating..
Instead of installing the python packages globally on your system, I'd recommend installing it in a virtualenv - that way you can be certain you've got all the right versions, without conflicting with any packages already installed on the system. Then you should just be able to pip install pyalsaaudio
.
@jerjou Yes, I have installed the packages inside a virtual environment as read me file suggests. However, it threw that checksum issue. I'm working on it to get it corrected. I just got the same package downloaded in one of my development machines which has opennao vm without an error. Hope to push it to nao from a fresh installation of opennao vm.
Is there any update from the back end guys? I checked with google support (esupport@google.com) on the same and got the following response pointing to this forum.
Subject: [#11708281] [Trial] Pre Trial Customer Inquiry [ ref:_00D00VNwG._5006013fvci:ref ] Thank you for your message. Possible that this issue something that we need to fix on our end however this feature is still on beta status and not subject for SLA. However I would like to suggest that you file a new issue on github for the sample code that you are following.
You may check out this link for the related issue and you may file a new one to address your concern since this is being monitored by the engineers who develop the code.
Yeah - they're looking into it; but keep in mind they're juggling other priorities (and it wasn't the obvious bug). I'll update here when I hear more.
Okay - they pushed a fix. Try again, and let me know if you're still hitting this.
Thanks a lot for following this up. I will check and confirm. As I have migrated to Australia, it will take sometime to confirm though. Meantime, if anyone else can confirm if this is fixed, it would be really good.
In which file did you encounter the issue?
transcribe_streaming.py
Did you change the file? If so, how?
Yes, to use arecord instead of pyaudio/portaudio. You can find the modified file (transcribe_streaming_arecord.py) attached to 7th comment of #728
Describe the issue
When the script was run, it throws the following error on regular basis. We are testing this from NAO robot mic and NAOqi OS (distribution based on Gentoo OS).
Mic is identified properly as seen in the following command.
And sound driver supports sampling rates exceeding 48k.
We have also observed that google server has complained about streaming too fast or slow even for same rate (e.g. 16k) at different times. Only difference between these tests was that the network latency kept changing and was high in general. Can the above error be caused by unstable network bandwidth? Is there any solution or workaround to use streaming under a bad network condition. Can any other factors cause this error?