SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.74k stars 1.07k forks source link

Abnormal result when transcribe ambient sound #127

Closed ILG2021 closed 1 year ago

ILG2021 commented 1 year ago

When I give some ambient sound as input to fast whisper in Chinese language, it will give abnormal results like below: 请不吝点赞 订阅 转发 打赏支持明镜与点点栏目 谢谢观看 下集再见 谢谢观看 欢迎订阅我的频道 字幕由Amara.org社区提供

ILG2021 commented 1 year ago

I have see someone disscuss this on openai whisper project: https://github.com/openai/whisper/discussions/928 this is a problem of the orginal model, but I don't know how to deal with it.

guillaumekln commented 1 year ago

With faster-whisper you can try enabling the VAD filter with vad_filter=True.

ILG2021 commented 1 year ago

With faster-whisper you can try enabling the VAD filter with vad_filter=True.

It works, thank you very much. Nowadays I am using open source models to realize a speech to speech translator. Because I only have a 1070ti, I have to use ctranslator models. I use faster-whisper(really amazing) as the ASR, nllb-200-3.3b-ct2 as the text translator and gTTS for the tts. I found nllb-200 is not very precise so I change to deepl api. For the tts, I have tried conqui tts, their models are scattered and not easy to use. For the stacks, can anyone give me some suggestion? Thank you very much.

ILG2021 commented 1 year ago

Despite add vad filter, it can still apprear sometimes. Hello, can it can removed by the whisper? because I use whisper in a speech translate, it is really make user unpleasant

guillaumekln commented 1 year ago

This an issue of the Whsiper model as discussed in https://github.com/openai/whisper/discussions/928. You can continue the discussion there.