fave-asr currently uses WhisperX, an extension of openai-whisper, for transcription. The problems and current solution are covered in #10:
Disfluencies and non-speech sounds are handled poorly....
Phrase-level and word-level time stamps are not sufficiently accurate....
...By providing a workflow for fine-tuning the model, the problems should hopefully be mitigated.
Another option, is to try a different transcription system. The linto-ai/whisper-timestamp package claims to address these issues while also making the program more memory efficient and multi-lingual.
My current belief is that completing #10 is still the best short term option, largely because no matter what system we use, the ability to fine-tune it on your transcribed data will be important. If it works well enough, it may push back the need for this migration. Long-term, however, I think whisper-timestamp is the better system.
fave-asr currently uses WhisperX, an extension of openai-whisper, for transcription. The problems and current solution are covered in #10:
Another option, is to try a different transcription system. The linto-ai/whisper-timestamp package claims to address these issues while also making the program more memory efficient and multi-lingual.
My current belief is that completing #10 is still the best short term option, largely because no matter what system we use, the ability to fine-tune it on your transcribed data will be important. If it works well enough, it may push back the need for this migration. Long-term, however, I think whisper-timestamp is the better system.