Closed Dnkhatri closed 1 year ago
Is possible to drag a .wav file to the waveform - so you can do any pre-processing you want.
Seems like people have integrated the Silero VAD into their Whisper setup: https://github.com/openai/whisper/discussions/397 and https://huggingface.co/spaces/aadnk/whisper-webui. Would certainly be more convenient than to do whatever preprocessing you would have to do on the WAVE file (which didn't help much in my case).
Currently the waveform shows noise so maybe a VAD could be used to detect actual speech. Then can either use it to filter the waveform or just use the VAD results directly.
https://thegradient.pub/one-voice-detector-to-rule-them-all/
https://github.com/snakers4/silero-vad