Closed o-stahl closed 5 months ago
Really excellent work on this!
I've done some testing and I think this is easily solid enough to go ahead and merge into the main branch.
I made one commit to tweak a few little things:
no-speech
while in interact mode. Otherwise it shows as an error after a bit of silence with no speech.switched to tts-1-hd
model as it seems to work fine
I know the mobile support for interact mode has some wonkiness on my phone at least, I'll be creating an issue for that problem though. I have some notion of an idea for a dynamic noise floor level calculation so our speech detection floor can vary with microphone sensitivity
- switched to
tts-1-hd
model as it seems to work fine
OpenAI's regular "tts-1" model is faster and 2x cheaper while according to user feedback the quality difference is (or at least was) barely noticeable even with audiophile gear. However as you mentioned as well, model selection will take care of different preferences.
Summary
This pull request introduces a new InteractMode component and integrates text-to-speech (TTS) and speech-to-text (STT) functionalities (the latter is not fully implemented in InteractMode). The enhancement by default leverages the Web Speech API and OpenAI's Whisper API to provide improved speech transcription.
Key Changes
InteractMode Component:
fetchTTSResponse Function:
fetchSTTResponse Function:
Toggle for Enhanced Accuracy:
Benefits
Notes & future plans
This is the first revision and only implements user speech to message transcription, but it should be perfectly usable in it's current state.
Auto Generated Notes (Do Not Change)