ad-aures / mufidiwiwhi

Multi-File Diarization with Whisper
MIT License
10 stars 1 forks source link

Mufidiwiwhi

Mufidiwiwhi (Multi-file diarisation with Whisper) is a tiny, quick-and-dirty program using Whisper.

It will transcript audio files with reliable speaker diarisation, by using one file per speaker: Mufidiwiwhi requires that you record each speaker in a separate file.
In order to do that, you can use Mumble to record a podcast with guests or use Ardour DAW to record a Podcast with several remote guests (you can also use Zrythm).
This will create 100% accurate diarisation.

Of course, you should run Mufidiwiwhi before merging all audio files together.

More information: Transcribe your Podcast with accurate speaker diarisation, for free, with Whisper

Make sure that you choose a podcast hosting platform that supports transcripts (such as Castopod!).

Setup

You need Whisper and Pydub installed.

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
pip install pydub
pip install git+https://github.com/ad-aures/mufidiwiwhi.git

Command-line usage

To get help, type

mufidiwiwhi --help

Example:

mufidiwiwhi Lucy interview_lucy.wav Samir interview_samir.wav Rachel interview_rachel.wav --model large --language French

See tokenizer.py for the list of all available languages.

License

Whisper's code and model weights are released under the MIT License. See LICENSE for further details.

Mufidiwiwhi's code is released under the MIT License. See LICENSE for further details.