Mufidiwiwhi (Multi-file diarisation with Whisper) is a tiny, quick-and-dirty program using Whisper.
It will transcript audio files with reliable speaker diarisation, by using one file per speaker: Mufidiwiwhi requires that you record each speaker in a separate file.
In order to do that, you can use Mumble to record a podcast with guests or use Ardour DAW to record a Podcast with several remote guests (you can also use Zrythm).
This will create 100% accurate diarisation.
Of course, you should run Mufidiwiwhi before merging all audio files together.
More information: Transcribe your Podcast with accurate speaker diarisation, for free, with Whisper
Make sure that you choose a podcast hosting platform that supports transcripts (such as Castopod!).
You need Whisper and Pydub installed.
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
pip install pydub
pip install git+https://github.com/ad-aures/mufidiwiwhi.git
To get help, type
mufidiwiwhi --help
Example:
mufidiwiwhi Lucy interview_lucy.wav Samir interview_samir.wav Rachel interview_rachel.wav --model large --language French
See tokenizer.py for the list of all available languages.
Whisper's code and model weights are released under the MIT License. See LICENSE for further details.
Mufidiwiwhi's code is released under the MIT License. See LICENSE for further details.