Open Zhen-Bo opened 1 year ago
Sorry for the delayed response. For the incorrect language identification issue, you should be able to fix that by setting the --language flag to the language spoken in the stream. The model only tries to identify the language if you leave the flag at the default ("auto"). The point of the repo was that you can use OpenAI's whisper model locally, so I don't wanna replace it with wit.ai.
Regarding adding an additional API call for translation into non-english languages: I like the idea, maybe I will add that when I get some free time. OpenAI's APIs are not free to use, only the web version of GPT-3.5 turbo is free.
I have used the --language setting to specify the language, but there are still cases where it cannot be recognized correctly. As for using an additional API for translation, I suggest letting users fill in their own API Key (if they are using open AI or deepl's API).
Feature Request
Description of the feature you'd like:
Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.
Feature Background:
After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model. It also frequently results in blank spaces.
I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.
And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.
Proposed Solution
speech-to-text: Use
wit.ai
to convert audio files into text wit.ai docstransalte: use
deepl
orchatGPT
to translate to user target language