Open benrito opened 5 months ago
Possible logic:
InteractionManager (needs to have something like "forced_injection" in simulator) Interaction (needs to have a listener?) PlantoidDialogue Listen
Think (listener; if received “interrupt event” immediately stop generation and append to message history) | Speak (if received “interrupt event” immediately stop generation and emit “interrupt” event)
In Speak….
if_human == false;
ShadowListener(self)
Shadow Deepgram transcription in subprocess;
Microphone threshold
If threshold passed, reset transcription
Terminate think | speak
Wait for is_final
force injection step as Human
Based on a lot of user testing, this is the #1 thing that will make the system more lifelike.
Expected behavior when "interruption" is enabled in config:
While a Plantoid is speaking, a human who speaks loudly can interrupt. The Plantoid stops speaking, a sound is played, and it's the human's turn.
WORKING BRANCH: origin/interrupt
This requires
The creation of an interruption function that emits an event when the microphone input passes a certain threshold (see dummy implementation in experiments/interruption.py). Ideally we are using the same Modified Microphone and/or Deepgram code so we can begin listening as soon as possible (might want to keep Deepgram running during speak for this purpose?)
Incorporation of interruption function as an async or subprocess in (speak.py?)
Incorporation of a listener in (interaction.py?) that listens for an "interruption" event
When listener gets an "interruption event," clean termination of the speak process, playback of a beep sound, and ensure that whatever we have is appended to message history.
It's immediately the human's turn.
Thoughts on implementation:
For maximum seamlessness, while a Plantoid is speaking, Deepgram will be running silently; when we pass the sensitivity threshold on mic, it's immediately human's turn and we're already transcribing. If this isn't possible, we can cue the human to keep speaking.
Also in this feature — we've got experiments/ runtime_effects.py to generate realistic cue sounds (e.g., oh, um, hrm, go on, ugh, what, yeah?) that we can pre-generate and play for more immersion (instead of beeps). Can explain over voice.