Transcripts4All / tools4all

A curated collection of tools to aid transcriptionists and subtitlers.
https://transcripts4all.github.io
16 stars 0 forks source link

Feature request: --initial_prompt command line argument for Whisper AI #3

Open BlohoJo opened 2 months ago

BlohoJo commented 2 months ago

There is a problem with Whisper AI where, especially in long transcripts, it will revert to a mode of transcribing that loses punctuation and capitalization for significant amounts of text. Sometimes it will be either approximately the entire first half, sometimes the entire second half, and sometimes it will switch on and off.

The only way I have found that avoids this (most of the time) is to do two things:

1) "Medium" model must be used, and 2) The --initial_prompt argument must be used containing a sentence with capitalization and punctuation in order to "prime" Whisper AI into using punctuation, e.g., --initial_prompt "Hello, my name is Anthony Thomas Morgan, and welcome to Political Legacies, my latest podcast. Today we will be talking about Barack Hussein Obama."

Whisper-diarization does allow the model to be switched to medium, but it doesn't allow the --initial_prompt command argument.

Is there any chance it could be added in order to prevent Whisper from switching to a "no punctuation and capitalization" mode?

ScriptTiger commented 2 months ago

Whisper-diarization is based on WhisperX and not on Whisper. So, the arguments are a bit different. However, WhisperX does have an ASR option with that functionality. So, you should be able to use that method now. I also just commented on a currently closed issue that was raised before over on the Whisper-diarization repo concerning this just to probe a bit if someone was still considering making an argument for this.