Conversation state listening while speaking

awslabs / aws-lex-browser-audio-capture

An example web application using the Lex JavaScript SDK to send and receive audio from the Lex PostContent API. Demonstrates how to capture an audio device, record audio, and convert the audio into a format that Lex will recognize, and play the response. All from a web browser.

MIT No Attribution

165 stars 75 forks source link

Conversation state listening while speaking #18

Closed rahulsinghthakur closed 5 years ago

rahulsinghthakur commented 5 years ago

Can we change the bot state to be listening as well while the bot is speaking?

rahulsinghthakur commented 5 years ago

Please help @palafranchise @jpeddicord

palafranchise commented 5 years ago

Hi rahulsinghthakur,

This example has a simple Conversation state machine which transitions from speaking to listening on onPlaybackComplete and elicit conversation states. But you don't have to use the Conversation abstraction. You can use the Audio control directly to achieve the behavior you're looking for.

Andrew

rahulsinghthakur commented 5 years ago

Can you please elaborate on how to use audio control while the bot is speaking and change the state as soon as a person speaks anything? @palafranchise

palafranchise commented 5 years ago

The audio control is a lower level API. If you want to manage your own conversation state you can start/stop recording at any time. You'll need to have a strategy for silence detection or take explicit customer input to determine when to stop recording. It may be challenging to record audio input while playing back audio if your audio output and input aren't isolated. It could work with headphones and a separate mic. Using your laptop mic and speakers would pose challenges around recording the active playback along with the customer input.

rahulsinghthakur commented 5 years ago

I was thinking of implementing a code where if a person says something we can just stop the bot playback and change the state to listening, But I am unable to write the code logic on how to know if a person is saying something or not @palafranchise

rahulsinghthakur commented 5 years ago

@palafranchise we can make use of wake words like stop or okay to stop the flow of lex bot playback but even to do that I have to change the state to listening and I am unable to write any condition for checking the users voice input.

palafranchise commented 5 years ago

Ah I see. This example is not a wake word solution--it's example of how to capture an audio device, record audio, convert the audio into a format that Lex will recognize, and play the response. It manages a conversation with a Lex bot to completion. If you want to support wake words, or interrupt the conversation with a wake word, you'll need to create some additional functionality. The audio control API would be useful for you when creating your own wake word support but you'd need to create your own wake word detection logic. One simple way to manage a conversation and handle wake words is to have two separate processes. Here's an example (although not browser based): https://aws.amazon.com/blogs/machine-learning/build-a-voice-kit-with-amazon-lex-and-a-raspberry-pi/