Open ChakshuGautam opened 6 months ago
@ChakshuGautam we will have English audio-transcript pair as well as hindi audio-transcript pair for force alignment ?
@ChakshuGautam we will have English audio-transcript pair as well as hindi audio-transcript pair for force alignment ?
Hey man @xorsuyash , can you link your discord? Wanted to collaborate with you on some topics : )
Hi I'm Harsha. I'll be happy to contribute to this project. I went through the complete description given above and have understood the tasks and task flow. As I can see few of the tasks here are assigned to other contributors, on which task can I start working on? Should I start working on tasks in the issue 2 ? Also we should we working to find datasets of both types right, that is, Mixed language Transcription and Monolingual transcription?
Description
The transcript will be
en
for English words andhi
for Hindi words. Using Whisper/Fairseq. Also, an alternate model that gives transcript in 100%hi
or 100%en
.Why
Building datasets
en
transcript.Training