STT_MV: Speech-to-text for Tibetan movie/TV program transcriptions.
CER: Character Error Rate, a metric for transcription quality.
Summary
Developing a script to filter out poor-quality transcriptions from Tibetan movie and TV program segments. This will help in improving the stt model performance by feeding it with stt data that are of high quality.
Dependencies
botok
Infrastructures
Design Illustrations
Justification
this approach uses botok to analyse the transcription text quality , which is more effecient than manually checking for correctness of spelling in the text.
Testing
Unit tests for each script component.
Evaluation using a subset of the dataset to assess filtering accuracy.
Implementation Steps
List all the steps involved during implementation.
[ ] OpenPecha/filter_bad_stt_mv_transcript#1
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#2
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#3
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#4
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#5
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#6
Estimated time: 0.5 hour
Actual time:
[ ] OpenPecha/filter_bad_stt_mv_transcript#7
Estimated time: 0.5 hour
Actual time:
RFC0116: Filter out bad transcriptions for STT_MV
Named Concepts
STT_MV: Speech-to-text for Tibetan movie/TV program transcriptions. CER: Character Error Rate, a metric for transcription quality.
Summary
Developing a script to filter out poor-quality transcriptions from Tibetan movie and TV program segments. This will help in improving the stt model performance by feeding it with stt data that are of high quality.
Dependencies
botok
Infrastructures
Design Illustrations
Justification
this approach uses botok to analyse the transcription text quality , which is more effecient than manually checking for correctness of spelling in the text.
Testing
Implementation Steps
List all the steps involved during implementation.
Reviewed By