Only 16kHz sample rate is supported

alexa-pi / AlexaPi

Alexa client for all your devices! # No active development. PRs welcome # consider https://github.com/respeaker/avs instead

MIT License

1.33k stars 396 forks source link

Only 16kHz sample rate is supported #161

Open JasperHorn opened 7 years ago

JasperHorn commented 7 years ago

My microphone which only supports sampling rates of 44.1kHz to 48kHz sampling rates cannot be used with AlexaPi.

The Alexa web service only supports 16kHz and webrtcvad supports 8kHz, 16kHz and 32kHz, so in order to support sampling rates other than 16kHz, a conversion would have to be done.

Currently, it is just assumed that the microphone supports 16kHz sampling, and the result (if it doesn't) is that the sampling rate is mislabeled, no voice is recognized and the whole thing just silently fails.

renekliment commented 7 years ago

If you have some code to contribute, a PR would be appreciated. Unfortunately it seems that not many people are experiencing this issue and no one is probably working on it. Thank you.

JasperHorn commented 7 years ago

I managed to get it working for my device (at least for the Alexa service, not the trigger word, but I'm not sure if that is related to the conversion at all). I just haven't gotten around to making it configurable instead of hacking it in with a constant. When I do, I'll make a PR.

renekliment commented 7 years ago

Great! When you do, please base it on latest dev (the stuff has moved into the alexapi.capture module, etc.).

renekliment commented 7 years ago

@JasperHorn Any progress on this? We might actually need resampling for #180.

JasperHorn commented 7 years ago

Not yet. I'll see if I can get to it this week. If I don't (or you want it sooner) I can provide a fragment of code that should make it easy to get this done (it's really just one line of code).

renekliment commented 7 years ago

@JasperHorn Please provide the code, so I can play around with it :-)

JasperHorn commented 7 years ago

Here's my full diff: samplerate.diff.txt. The part with the keyword detection can be ignored, since I tried a number of different things there but never got it working.

It's surprisingly simple actually:

 fragment, ratecv_state = audioop.ratecv(data, 1, 1, output_rate, input_rate, ratecv_state)

ratecv_state just tracks the internal state of the conversion and should be initialized to None. Fragment will contain the resampled audio data.

audioop is from the standard library, so you don't need to install anything, just import it.

renekliment commented 7 years ago

@JasperHorn Nice, thanks!

I got a little more experience with the audio stuff. What device name do you use? ALSA devices with names plughw: get conversion in ALSA plug module automatically. See arecord -L for that - our list doesn't contain that unfortunately :-(

JasperHorn commented 7 years ago

I am using plughw:1 . I'm not sure if I follow what you're saying about that, though.

renekliment commented 7 years ago

Hmmm. Send you recording.wav with current dev (w/o your resampling code) with this device name in the config.

JasperHorn commented 7 years ago

recording_16k.wav.txt recording_44.1k.wav.txt

Just remove the .txt, I added it to get around GitHub's upload restrictions. The number in the filename is what the argument to inp.setrate() was equal to.

renekliment commented 7 years ago

@JasperHorn Thanks. Just played them and they are both 16 kHz and fine.

renekliment commented 7 years ago

When you are on dev and trigger AlexaPi with a keyboard, button, or something, your speech isn't recognized by Amazon? Do you have anything at https://alexa.amazon.com ? (recognized entries, request history)

aarmea commented 6 years ago

The patch file in this thread is out of date, but I was able to get AlexaPi working with my USB headset adapter that only supports 44.1 and 48 kHz by forcing ALSA to use the plug resampler globally:

/etc/asound.conf:

pcm.!default {
  type plug
  slave {
    pcm "hw:1,0"
  }
}
ctl.!default {
  type plug
  slave {
    pcm "hw:1,0"
  }
}

While hotword detection doesn't work too great out of the box with this setup (maybe pocketsphinx is bothered by resampling artifacts or something), it drastically improves if you use Snowboy instead.