Implement Detection and Termination of Unintended Ambient Transcription

In the current implementation of Say, Pi, if a user leaves a conversation open without explicitly ending it, the extension continues to transcribe any ambient audio indefinitely. This can lead to unintended transcription of background noise or irrelevant audio, such as a TV playing while the user is asleep. As a result, it can cause unnecessary API usage and increased costs.

To address this issue, we need a mechanism to detect when a conversation is likely transcribing unintended ambient audio for an extended period and automatically terminate it to prevent further transcription.

Proposed Solution:

Set a fixed "Ambient Transcription Timeout" duration (e.g., 1 hour) after which a conversation should be considered as potentially transcribing unintended ambient audio and be automatically terminated.
Implement a timer in the Say, Pi extension to track the duration of continuous transcription:
- Start the timer when the first transcription is sent to Pi in a new conversation.
- Keep the timer running as long as transcriptions are being continuously sent.
- If there is a significant pause in transcription (e.g., no transcriptions for 5 minutes), consider the conversation as "paused" and stop the timer.
- If transcription resumes after a pause, restart the timer from zero.
If the continuous transcription timer reaches the fixed "Ambient Transcription Timeout" duration:
- Display a humorous prompt to the user, such as:
```
Prompt: "Wow, this is a long conversation. Are you still there, or are we talking to ourselves?"
<timer ticking down to zero>
Button: "Yes, I'm still here!"
```
- Start a timer (e.g., 30 seconds) for the user to respond to the prompt.
- If the user clicks the "Yes, I'm still here!" button within the given time frame, consider the conversation as active and continue transcription.
- If the user doesn't interact with the prompt before the timer reaches zero, proceed with terminating the conversation.
Terminating a conversation flagged for potential unintended ambient transcription:
- Stop the speech recognition and transcription processes.
- Send a closing message to Pi, indicating that the conversation has been terminated due to suspected unintended ambient transcription.
- Close the active conversation in the Say, Pi interface.

Benefits:

Helps detect and prevent unintended transcription of ambient audio, such as background noise or irrelevant sounds.
Provides a safeguard against accidental prolonged transcription and excessive API usage.
Reduces unnecessary costs associated with transcribing unintended ambient audio.
Engages users with a humorous prompt to confirm their presence and intention to continue the conversation.
Removes the risk of users disabling the feature or setting excessively long timeout durations, which could lead to potential abuse or unintended costs.

Pedal-Intelligence / saypi-userscript

Implement Detection and Termination of Unintended Ambient Transcription #80