collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
2.05k stars 281 forks source link

The concept of transcribed bilingualism. #184

Open Ye83 opened 8 months ago

Ye83 commented 8 months ago

When recording in real time, for example, I specified the output voice to be English, and the audio played was a mixture of Japanese and English, sometimes translating Japanese into English and sometimes not. Can this be set to fixed not translate or translate?

makaveli10 commented 8 months ago

Hello @Ye83 Can you share the commands used to run whisper server and client?

Ye83 commented 8 months ago

@makaveli10 Thank you for your answer, This is the server code: image

This is the client code: The client connects directly with the webSocket and sends this json string when the connection is successful: { "uid": "7b6d5a7c-c878-42fa-8493-43025b1e34ee", "language": "en", "task": "transcribe", "model": "small", "use_vad": true } And Can I add your personal contact information? I have a lot of questions to ask you.

makaveli10 commented 8 months ago

Thanks for sharing the details. One thing you could try is set the language as japanese and the task as translate.

Ye83 commented 8 months ago

1710731383133 This is the result of watching youtube bilingual videos using the Google plugin. The task is translate and the language is en, but the output is still in Chinese. You can try the Google plugin and watch this video for a few minutes, and the same should happen. youtube video link:https://www.youtube.com/watch?v=J9M-Xgt5qzw Thank you for your reply. I need your help

makaveli10 commented 7 months ago

@Ye83 after testing this with large-v3 it seems like an issue when there are two languages, it works well in translating chinese to english when there is only chinese present in the audio.

And Can I add your personal contact information? I have a lot of questions to ask you.

The contact details are in the readme. Happy to answer all questions. Thanks!

Ye83 commented 7 months ago

Thank you for conducting the tests and providing the answers. Is there a possibility to optimize this issue? It would be greatly appreciated. @makaveli10

jsichi commented 7 months ago

I added a start on an implementation in #200 for a semi-related issue, where I wanted the transcription to be able to preserve multiple input languages, and also wanted to restrict the set of languages to listen for (since sometimes I was seeing Chinese where it should have been Russian, for example).