What is the best model for OpenWhisper?

I came to know that in order to use OpenWhisper to transcribe accurately one needs to seperate the voice from other parts of the audio and to remove silence with a pre-trained enterprise-grade Voice Activity Detector.

Now what model does voice seperation the best for OpenWhisper to understand could be a nice test case in the main table. Maybe someone could share their favorite?

I see https://mvsep.com uses OpenWhisper - What models does the site use to clean the audio as without cleaning Whisper spits out mostly garbage. Edit: https://mvsep.com uses MDX23C - but how does one clean enough the audio without a VAD for Whisper not to hallucinate?

ZFTurbo / Music-Source-Separation-Training

What is the best model for OpenWhisper? #95