hallicination for full silent audio input

fedirz / faster-whisper-server

MIT License

673 stars 96 forks source link

Open yijiegao2006 opened 2 days ago

yijiegao2006 commented 2 days ago

If audio input is full silent, it will output/ return things like "MBC 뉴스 이재경입니다.".

Seems VAD has been applied in SYSTRAN/faster-whisper since for sample with long silence but verbal at the end it could return correct text.

thiswillbeyourgithub commented 1 day ago

This is inherent to all whisper based models.

Also related to #108 as cleaningup audio will return a very short result if it's silence so it wouls be ignored as under the duration threshold