Pedal-Intelligence / saypi-userscript

An independent voice interface for Inflection AI's conversational assistant, Pi
https://www.saypi.ai/
Other
15 stars 3 forks source link

Dynamic Submission Delay #63

Open rosscado opened 5 months ago

rosscado commented 5 months ago

To enhance the user experience of 'Say, Pi', we propose implementing a "Dynamic Submission Delay" feature. This feature aims to dynamically adjust the delay before automatic prompt submission, making conversations more fluid and responsive.

Background

The current implementation of 'Say, Pi' employs a sophisticated submission delay function that is not a mere fixed delay but rather a dynamic calculation based on multiple parameters. The function calculateDelay in the TimerModule takes into account the time when the user stopped speaking, the probability of the user having finished speaking (probabilityFinished), the tempo of the user's speech, and a maximum delay threshold. This mechanism allows for some level of dynamic response, adapting to the user's speech tempo and the likelihood that they have finished speaking.

However, this current method still adheres to a maximum delay limit (maxDelay) and does not account for situations where the user might resume speaking after the transmission time but before Pi's response. The proposed enhancement seeks to introduce an additional floating parameter or variable weight into this function. This new parameter would dynamically adjust based on real-time user interaction, particularly focusing on instances where the user resumes speaking post-transmission. The goal is to refine the responsiveness of 'Say, Pi', making the conversation flow more naturally and in closer alignment with the user's conversational rhythm.

By integrating this enhancement, we aim to move beyond the constraints of the existing parameters and introduce a more adaptive and user-centric approach to managing the submission delay, enhancing the overall conversational experience for 'Say, Pi' users.

Proposed Enhancement

The "Dynamic Submission Delay" feature will involve tracking and analysing three key durations:

  1. Transcription Duration: Time taken for speech-to-text transcription, including API processing and network latency.
  2. Listening Delay: Time between receiving the transcription and submitting the prompt to Pi. This delay will be dynamically adjusted based on user interactions.
  3. User Interaction Analysis: The system will observe if the user resumes speaking after the transmission time but before Pi's response. If this occurs, it indicates that the current listening delay might be too long, necessitating an increase in the delay with a strong effect. Conversely, if the user does not resume speaking, the listening delay will be reduced, albeit with a weaker effect.

Goals

Implementation Considerations

Out of Scope

rosscado commented 3 days ago

Experimentally lowering max submission delay from 8s in v1.6.0 to 7s in v1.6.1.