Currently, manim-voiceover only supports Whisper as a transcription service, and it is hard coded for all SpeechService backends. I propose that the manim_voiceover.services module be modified to have flexibility in which transcription backend is being used.
How can the new feature be used?
Not only will this give users a choice in which transcription service to use, but it will also make it much easier for users to add transcription services that are not yet covered.
Additional comments
Whisper is no longer the best transcription service available, being beaten by other services such as AssemblyAI, which also supports word-level timestamps among other features.
Description of proposed feature
Currently,
manim-voiceover
only supports Whisper as a transcription service, and it is hard coded for allSpeechService
backends. I propose that themanim_voiceover.services
module be modified to have flexibility in which transcription backend is being used.How can the new feature be used?
Not only will this give users a choice in which transcription service to use, but it will also make it much easier for users to add transcription services that are not yet covered.
Additional comments
Whisper is no longer the best transcription service available, being beaten by other services such as AssemblyAI, which also supports word-level timestamps among other features.