Closed ILG2021 closed 1 year ago
I have see someone disscuss this on openai whisper project: https://github.com/openai/whisper/discussions/928 this is a problem of the orginal model, but I don't know how to deal with it.
With faster-whisper you can try enabling the VAD filter with vad_filter=True
.
With faster-whisper you can try enabling the VAD filter with
vad_filter=True
.
It works, thank you very much. Nowadays I am using open source models to realize a speech to speech translator. Because I only have a 1070ti, I have to use ctranslator models. I use faster-whisper(really amazing) as the ASR, nllb-200-3.3b-ct2 as the text translator and gTTS for the tts. I found nllb-200 is not very precise so I change to deepl api. For the tts, I have tried conqui tts, their models are scattered and not easy to use. For the stacks, can anyone give me some suggestion? Thank you very much.
Despite add vad filter, it can still apprear sometimes. Hello, can it can removed by the whisper? because I use whisper in a speech translate, it is really make user unpleasant
This an issue of the Whsiper model as discussed in https://github.com/openai/whisper/discussions/928. You can continue the discussion there.
When I give some ambient sound as input to fast whisper in Chinese language, it will give abnormal results like below: 请不吝点赞 订阅 转发 打赏支持明镜与点点栏目 谢谢观看 下集再见 谢谢观看 欢迎订阅我的频道 字幕由Amara.org社区提供