Abnormal result when transcribe ambient sound

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

MIT License

12.74k stars 1.07k forks source link

Abnormal result when transcribe ambient sound #127

Closed ILG2021 closed 1 year ago

ILG2021 commented 1 year ago

When I give some ambient sound as input to fast whisper in Chinese language, it will give abnormal results like below: 请不吝点赞订阅转发打赏支持明镜与点点栏目谢谢观看下集再见谢谢观看欢迎订阅我的频道字幕由Amara.org社区提供

ILG2021 commented 1 year ago

I have see someone disscuss this on openai whisper project: https://github.com/openai/whisper/discussions/928 this is a problem of the orginal model, but I don't know how to deal with it.

guillaumekln commented 1 year ago

With faster-whisper you can try enabling the VAD filter with vad_filter=True.

ILG2021 commented 1 year ago

With faster-whisper you can try enabling the VAD filter with vad_filter=True.

It works, thank you very much. Nowadays I am using open source models to realize a speech to speech translator. Because I only have a 1070ti, I have to use ctranslator models. I use faster-whisper(really amazing) as the ASR, nllb-200-3.3b-ct2 as the text translator and gTTS for the tts. I found nllb-200 is not very precise so I change to deepl api. For the tts, I have tried conqui tts, their models are scattered and not easy to use. For the stacks, can anyone give me some suggestion? Thank you very much.

ILG2021 commented 1 year ago

Despite add vad filter, it can still apprear sometimes. Hello, can it can removed by the whisper? because I use whisper in a speech translate, it is really make user unpleasant

guillaumekln commented 1 year ago

This an issue of the Whsiper model as discussed in https://github.com/openai/whisper/discussions/928. You can continue the discussion there.