Open pranavbhat12 opened 11 months ago
Try to extract vocals with spleeter. See more interesting processing (like silence/noise removal) here: https://github.com/EtienneAb3d/WhisperHallu
Have you tried playing with the VAD parameter that is built in to faster-whisper? As EtienneAb3d mentioned, extracting vocals with spleeter/demucs does help but it may or may not hurt your transcription as both models of spleeter and demucs are using a sample rate of 44.1khz while whisper is trained on sample rate of 16khz.
I have tried, after the voice audio file after extraction, the transcription effect is not as good as using the original audio directly. Config VAD greatly optimizes the problem of misidentification caused by non-human voices,here is my config:
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=500)
more config see -> https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/vad.py
I am facing 2 issues while transcribing the audio files:
Really appreciate for the help.Thankyou.