Plantoidz / mechanical-garden-FA

Multimodal mechanical plantoid garden repository.
3 stars 3 forks source link

Interrupt function / option #46

Open benrito opened 5 months ago

benrito commented 5 months ago

Based on a lot of user testing, this is the #1 thing that will make the system more lifelike.

Expected behavior when "interruption" is enabled in config:

While a Plantoid is speaking, a human who speaks loudly can interrupt. The Plantoid stops speaking, a sound is played, and it's the human's turn.

WORKING BRANCH: origin/interrupt

This requires

  1. The creation of an interruption function that emits an event when the microphone input passes a certain threshold (see dummy implementation in experiments/interruption.py). Ideally we are using the same Modified Microphone and/or Deepgram code so we can begin listening as soon as possible (might want to keep Deepgram running during speak for this purpose?)

  2. Incorporation of interruption function as an async or subprocess in (speak.py?)

  3. Incorporation of a listener in (interaction.py?) that listens for an "interruption" event

  4. When listener gets an "interruption event," clean termination of the speak process, playback of a beep sound, and ensure that whatever we have is appended to message history.

  5. It's immediately the human's turn.

Thoughts on implementation:

For maximum seamlessness, while a Plantoid is speaking, Deepgram will be running silently; when we pass the sensitivity threshold on mic, it's immediately human's turn and we're already transcribing. If this isn't possible, we can cue the human to keep speaking.

Also in this feature — we've got experiments/ runtime_effects.py to generate realistic cue sounds (e.g., oh, um, hrm, go on, ugh, what, yeah?) that we can pre-generate and play for more immersion (instead of beeps). Can explain over voice.

benrito commented 5 months ago

Possible logic:

InteractionManager (needs to have something like "forced_injection" in simulator) Interaction (needs to have a listener?) PlantoidDialogue Listen

Think (listener; if received “interrupt event” immediately stop generation and append to message history) | Speak (if received “interrupt event” immediately stop generation and emit “interrupt” event)

In Speak….

if_human == false;

ShadowListener(self)
    Shadow Deepgram transcription in subprocess;
    Microphone threshold 
        If threshold passed, reset transcription
        Terminate think | speak
        Wait for is_final
        force injection step as Human