awslabs / aws-lex-browser-audio-capture

An example web application using the Lex JavaScript SDK to send and receive audio from the Lex PostContent API. Demonstrates how to capture an audio device, record audio, and convert the audio into a format that Lex will recognize, and play the response. All from a web browser.
MIT No Attribution
165 stars 75 forks source link

Prevent sending empty, "silence" only buffer #8

Closed marjanSterjev closed 6 years ago

marjanSterjev commented 6 years ago

Hi,

At the moment, the conversation logic advances to Sending state after silence no matter if there is a content to be sent or not. So we end up with “Sorry can you repeat that?” response from Lex. Is it possible to avoid this send action (PostContent) if the buffer is “silence” only and return to the Listening state in that case?

Thanks

palafranchise commented 6 years ago

Hi marjaSterjev,

Yup, the silence detection algorithm is really simple. As you point out it does not wait to hear "non-silence" before starting the silence detection timer. If you want more time before needing to speak you can play around with the time value in the silenceDetectionConfig. Alternatively, it should be relatively easy to add "noise-detection" to make the recorder's analyze function wait until it detects noise before starting the silence detection timer:

https://github.com/awslabs/aws-lex-browser-audio-capture/blob/master/lib/recorder.js#L81

Andrew

marjanSterjev commented 6 years ago

Hi Andrew,

Thanks for your answer. Based on your instructions, I guess the following approach will work.

record.js

var record = function (onSilence, visualizer) { silenceCallback = onSilence; visualizationCallback = visualizer; //start = Date.now(); start = null; recording = true; };

var analyse = function () { analyser.fftSize = 2048; var bufferLength = analyser.fftSize; var dataArray = new Uint8Array(bufferLength); var amplitude = silenceDetectionConfig.amplitude; var time = silenceDetectionConfig.time;

  analyser.getByteTimeDomainData(dataArray);

  if (typeof visualizationCallback === 'function') {
    visualizationCallback(dataArray, bufferLength);
  }

  for (var i = 0; i < bufferLength; i++) {
    // Normalize between -1 and 1.
    var curr_value_time = (dataArray[i] / 128) - 1.0;
    if (curr_value_time > amplitude || curr_value_time < (-1 * amplitude)) {
      start = Date.now();
    }
  }
if(start!=null) {
    var newtime = Date.now();
    var elapsedTime = newtime - start;
    if (elapsedTime > time) {
      silenceCallback();
    }
}
else {
    worker.postMessage({command: 'clear'});
}
};

Do you see some problems with this approach?

Thanks, Marjan

palafranchise commented 6 years ago

That looks like it should do it. Did you try it out?

palafranchise commented 6 years ago

You can make the above changes and build the distribution with grunt dist to get the bundled source.

Noise detection is a perfectly reasonable behavior to have but I'm hesitant to add it as this example code was created as a companion to the examples in this blog post--it's a teaching aid rather than a feature rich implementation.

Happy to help out with any other questions you may have!