ChakshuGautam / whisper-hinglish

1 stars 0 forks source link

Research on existing datasets and Techniques #2

Open rayaanoidPrime opened 1 month ago

rayaanoidPrime commented 1 month ago

There are some exsting datasets that we can leverage directly such as -

Synthetic generation of code switching dataset generation from monolingual sources

Tasks :

harshaharod21 commented 4 weeks ago

Already existing datasets : 1) https://github.com/google-research-datasets/hinglish-top-dataset Sourced from : Ai4bharathttps://github.com/AI4Bharat/indicnlp_catalog 2) https://github.com/goru001/nlp-for-hinglish dataset link: datasethttps://www.dropbox.com/sh/as5fg8jsrljt6k7/AADnSLlSNJPeAndFycJGurOUa?e=1&dl=0

As this project is for public usecases , can we not request the free access of the below dataset from IITG https://www.iitg.ac.in/eee/emstlab/HingCoS_Database/HingCoS.html

Existing ASR model for Hinglish: https://github.com/Open-Speech-EkStep/vakyansh-models?tab=readme-ov-file#interspeech-2021-asr-models