braden-w / whispering

https://whispering.bradenwong.com/
MIT License
175 stars 20 forks source link

Feature Request: Epic: Personalized Voice-to-Text Conversion with Real-Time Adjustments #145

Open Wstnn opened 3 days ago

Wstnn commented 3 days ago

Epic: Enhanced Whispering Functionality with Custom GPT Integration

Description: Extend the existing Whispering app to integrate a custom GPT model for personalized transcriptions. This feature will allow users to configure writing styles, fine-tune the model dynamically, and manage custom GPT settings directly within the app.

User Stories:

User Story 1: Configure Writing Style Examples

As a user, I want to input my writing style examples in the settings, so that my transcriptions reflect my unique style.

Tasks:

  1. Add a text box in the settings page for writing style examples.
  2. Include default instructional text in the text box.
  3. Store the input data locally.

User Story 2: Fine-Tune Model with Provided Data

As a user, I want the app to fine-tune a model using my provided examples, so that my transcriptions match my writing style.

Tasks:

  1. Implement functionality to send the provided examples to the API for fine-tuning.
  2. Store the fine-tuned model ID within the app.
  3. Allow users to add more examples and update the model accordingly.

User Story 3: Personalize Transcription with Fine-Tuned Model

As a user, I want the app to personalize transcriptions using the fine-tuned model, so that the responses match my configured writing style.

Tasks:

  1. Implement API call to send transcriptions to the fine-tuned model.
  2. Display the personalized response in the app.

User Story 4: Real-Time Adjustment of Responses

As a user, I want to provide hints or keywords through my speech to adjust the response in real-time, so that I can tailor the output as needed.

Tasks:

  1. Implement logic to identify and recognize hints or keywords in the spoken transcription.
  2. Implement API to adjust responses based on the recognized hints.
  3. Ensure the API can register and adapt suggested changes fluently.

Existing Functionality:

By adding these functionalities, the Whispering app will be significantly enhanced to provide personalized and contextually appropriate transcriptions using custom GPT models.

Wstnn commented 3 days ago

this would also programmatically fix

https://github.com/braden-w/whispering/issues/111

as the output language could be prompted for. michaelbeijer

braden-w commented 1 day ago

Hey @Wstnn , thanks for the issue!

Just to clarify, is this custom GPT model running on your own machine, or would this be through an API endpoint, such as the OpenAI GPT 4o model?

I just need a bit more detail on how you plan on ingesting the text into your model to make an integration.

Wstnn commented 1 day ago

Hey @braden-w

Thanks for your response. I was considering using custom GPTs created in the ChatGPT UI for Whispering to rephrase answers and control responses with specific prompts. However, it seems these GPTs are not exposed via API and can't be used directly. Can you confirm this?

If true, we could allow users to configure their writing style within Whispering's settings. They could update this configuration or input multiple cases to tailor responses for different use cases, like translation or rephrasing, by entering them into a designated field.

I've updated my initial feature request to reflect this refined approach. Does this seem doable to you?

Best regards

doxgt commented 7 hours ago

Things seem to be in constant flux at OpenAI. You could ask your question directly at: https://community.openai.com/.

In the meantime, you may wish to "glance" at: https://community.openai.com/t/how-to-make-an-api-call-to-a-custom-gpt-model/491835/67

What you could do, potentially, is to give specific post-processing instructions to something like GPT-4o to tweak the output you get from Whispering - through API. Perhaps that is a feature @braden-w could implement down the road. And users would need to know that post-processing in such manners would cost extra tokens.

..., we could allow users to configure their writing style within Whispering's settings. They could update this configuration or input multiple cases to tailor responses for different use cases, like translation or rephrasing, by entering them into a designated field.

I don't think the whisper engine itself, as hosted on OpenAI today, does anything fancy beyond "digesting" 200 some odd tokens for spelling/vocabulary purposes (loosely speaking). Who knows, maybe the next iteration of whisper model will be more intelligent, in and of itself.