feature request: use wit.ai speech to text and deepl/open ai to transtate it

Zhen-Bo commented 1 year ago

Feature Request

Description of the feature you'd like:

Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.

Feature Background:

After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model. It also frequently results in blank spaces.

I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.

And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.

Proposed Solution

speech-to-text: Use wit.ai to convert audio files into text wit.ai docs
- Free to use
- Users can customize the unique language corresponding to the API token, so as not to cause incorrect language identification.
- The recognition speed is very fast and accurate. (I use it to identify Google reCAPTCHA voice verification, which is very fast and accurate.)
transalte: use deepl or chatGPT to translate to user target language
- Deepl free api and GPT-3.5 turbo is free to use
- Can set target language by user (for me: KO (text from wit.ai) -> ZH)

fortypercnt commented 1 year ago

Sorry for the delayed response. For the incorrect language identification issue, you should be able to fix that by setting the --language flag to the language spoken in the stream. The model only tries to identify the language if you leave the flag at the default ("auto"). The point of the repo was that you can use OpenAI's whisper model locally, so I don't wanna replace it with wit.ai.

Regarding adding an additional API call for translation into non-english languages: I like the idea, maybe I will add that when I get some free time. OpenAI's APIs are not free to use, only the web version of GPT-3.5 turbo is free.

Zhen-Bo commented 1 year ago

I have used the --language setting to specify the language, but there are still cases where it cannot be recognized correctly. As for using an additional API for translation, I suggest letting users fill in their own API Key (if they are using open AI or deepl's API).

fortypercnt / stream-translator

feature request: use wit.ai speech to text and deepl/open ai to transtate it #11

Feature Request

Proposed Solution