Open yijiegao2006 opened 2 days ago
If audio input is full silent, it will output/ return things like "MBC 뉴스 이재경입니다.".
Seems VAD has been applied in SYSTRAN/faster-whisper since for sample with long silence but verbal at the end it could return correct text.
This is inherent to all whisper based models.
Also related to #108 as cleaningup audio will return a very short result if it's silence so it wouls be ignored as under the duration threshold
If audio input is full silent, it will output/ return things like "MBC 뉴스 이재경입니다.".
Seems VAD has been applied in SYSTRAN/faster-whisper since for sample with long silence but verbal at the end it could return correct text.