ZFTurbo / Music-Source-Separation-Training

Repository for training models for music source separation.
MIT License
477 stars 66 forks source link

What is the best model for OpenWhisper? #95

Open montvid opened 4 hours ago

montvid commented 4 hours ago

I came to know that in order to use OpenWhisper to transcribe accurately one needs to seperate the voice from other parts of the audio and to remove silence with a pre-trained enterprise-grade Voice Activity Detector.

Now what model does voice seperation the best for OpenWhisper to understand could be a nice test case in the main table. Maybe someone could share their favorite?

I see https://mvsep.com uses OpenWhisper - What models does the site use to clean the audio as without cleaning Whisper spits out mostly garbage. Edit: https://mvsep.com uses MDX23C - but how does one clean enough the audio without a VAD for Whisper not to hallucinate?

ZFTurbo commented 3 hours ago

We use our own BS Roformer model similar to BS Roformer (viperx) vocal model in this list: https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models