fingerthief / minimal-chat

MinimalChat is a lightweight, open-source chat application that allows you to interact with various large language models.
https://minimalchat.app
MIT License
143 stars 19 forks source link

Add InteractMode Component with TTS and STT Functions #134

Closed o-stahl closed 5 months ago

o-stahl commented 5 months ago

Summary


This pull request introduces a new InteractMode component and integrates text-to-speech (TTS) and speech-to-text (STT) functionalities (the latter is not fully implemented in InteractMode). The enhancement by default leverages the Web Speech API and OpenAI's Whisper API to provide improved speech transcription.

Screenshot 2024-05-30 020714

Key Changes


  1. InteractMode Component:

    • Implemented the InteractMode component to handle speech interactions within the chat application.
    • Added functionality to monitor and visualize audio input in real-time.
  2. fetchTTSResponse Function:

    • Added fetchTTSResponse function to convert text to speech using the OpenAI API.
    • Ensures high-quality audio playback of transcribed text.
  3. fetchSTTResponse Function:

    • Added fetchSTTResponse function to transcribe audio to text using the OpenAI Whisper API.
    • Utilizes the Web Speech API for initial speech detection and transcription.
    • Switches to Whisper API for more accurate transcription when enabled.
  4. Toggle for Enhanced Accuracy:

    • Introduced a toggle to switch between Web Speech API and Whisper API for transcription.
    • Ensures only relevant speech is transcribed, reducing noise and improving accuracy.

Benefits


Notes & future plans

This is the first revision and only implements user speech to message transcription, but it should be perfectly usable in it's current state.

Auto Generated Notes (Do Not Change)


fingerthief commented 5 months ago

Really excellent work on this!

I've done some testing and I think this is easily solid enough to go ahead and merge into the main branch.

I made one commit to tweak a few little things:

o-stahl commented 5 months ago
  • switched to tts-1-hd model as it seems to work fine

OpenAI's regular "tts-1" model is faster and 2x cheaper while according to user feedback the quality difference is (or at least was) barely noticeable even with audiophile gear. However as you mentioned as well, model selection will take care of different preferences.