Feature REquest Using VAD for subtitles syncing instead of waveform

SubtitleEdit / subtitleedit

the subtitle editor :)

http://www.nikse.dk/SubtitleEdit/Help

GNU General Public License v3.0

8.88k stars 915 forks source link

Feature REquest Using VAD for subtitles syncing instead of waveform #5796

Closed Dnkhatri closed 1 year ago

Dnkhatri commented 2 years ago

Currently the waveform shows noise so maybe a VAD could be used to detect actual speech. Then can either use it to filter the waveform or just use the VAD results directly.

https://thegradient.pub/one-voice-detector-to-rule-them-all/

https://github.com/snakers4/silero-vad

niksedk commented 2 years ago

Is possible to drag a .wav file to the waveform - so you can do any pre-processing you want.

n-99 commented 1 year ago

Seems like people have integrated the Silero VAD into their Whisper setup: https://github.com/openai/whisper/discussions/397 and https://huggingface.co/spaces/aadnk/whisper-webui. Would certainly be more convenient than to do whatever preprocessing you would have to do on the WAVE file (which didn't help much in my case).