fortypercnt / stream-translator

MIT License
219 stars 44 forks source link

feature request: use wit.ai speech to text and deepl/open ai to transtate it #11

Open Zhen-Bo opened 10 months ago

Zhen-Bo commented 10 months ago

Feature Request

Description of the feature you'd like:

Want to use the user's own wit.ai and deepl API key for real-time speech-to-text translation.

Feature Background:

After using it for a while, I found that there is often a translation delay issue (interval=3~5) when using the medium model. It also frequently results in blank spaces.

I don't know if it's due to the delay in voice recognition or incorrect identification of language type that causes the translation failure.

And English is not my native language. After receiving English, I need to spend some time converting it into my native language. So I hope to increase the variety of translation languages.

Proposed Solution

fortypercnt commented 8 months ago

Sorry for the delayed response. For the incorrect language identification issue, you should be able to fix that by setting the --language flag to the language spoken in the stream. The model only tries to identify the language if you leave the flag at the default ("auto"). The point of the repo was that you can use OpenAI's whisper model locally, so I don't wanna replace it with wit.ai.

Regarding adding an additional API call for translation into non-english languages: I like the idea, maybe I will add that when I get some free time. OpenAI's APIs are not free to use, only the web version of GPT-3.5 turbo is free.

Zhen-Bo commented 8 months ago

I have used the --language setting to specify the language, but there are still cases where it cannot be recognized correctly. As for using an additional API for translation, I suggest letting users fill in their own API Key (if they are using open AI or deepl's API).