Open rayaanoidPrime opened 6 months ago
Already existing datasets : 1) https://github.com/google-research-datasets/hinglish-top-dataset Sourced from : Ai4bharathttps://github.com/AI4Bharat/indicnlp_catalog 2) https://github.com/goru001/nlp-for-hinglish dataset link: datasethttps://www.dropbox.com/sh/as5fg8jsrljt6k7/AADnSLlSNJPeAndFycJGurOUa?e=1&dl=0
As this project is for public usecases , can we not request the free access of the below dataset from IITG https://www.iitg.ac.in/eee/emstlab/HingCoS_Database/HingCoS.html
Existing ASR model for Hinglish: https://github.com/Open-Speech-EkStep/vakyansh-models?tab=readme-ov-file#interspeech-2021-asr-models
There are some exsting datasets that we can leverage directly such as -
Synthetic generation of code switching dataset generation from monolingual sources
Tasks :