creating a pipeline to successfully download a stt data from github repo tibetan news audio release page and convert the audio into proper format required for training data. then spilt the audio as usual and upload the transcript csv to stt pecha tool database.
Completion Criteria
The stt_nw data are shown in stt.pecha.tools stats .
Implementation Plan
Subtasks
[x] download the stt news data from github, create script.
[x] convert it to proper training format.
[ ] spilt audio, then run inference, create csv and upload to database.
Description
creating a pipeline to successfully download a stt data from github repo tibetan news audio release page and convert the audio into proper format required for training data. then spilt the audio as usual and upload the transcript csv to stt pecha tool database.
Completion Criteria
The stt_nw data are shown in stt.pecha.tools stats .
Implementation Plan
Subtasks