fedirz / faster-whisper-server

https://hub.docker.com/r/fedirz/faster-whisper-server
MIT License
673 stars 96 forks source link

hallicination for full silent audio input #133

Open yijiegao2006 opened 2 days ago

yijiegao2006 commented 2 days ago

If audio input is full silent, it will output/ return things like "MBC 뉴스 이재경입니다.".

Seems VAD has been applied in SYSTRAN/faster-whisper since for sample with long silence but verbal at the end it could return correct text.

thiswillbeyourgithub commented 1 day ago

This is inherent to all whisper based models.

Also related to #108 as cleaningup audio will return a very short result if it's silence so it wouls be ignored as under the duration threshold