IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
260 stars 61 forks source link

Create dataset loader for IndoYTASRNews #372

Open SamuelCahyawijaya opened 7 months ago

SamuelCahyawijaya commented 7 months ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?indo_ytasr_news

Dataset indo_ytasr_news
Description IndoYTASRNews is a ASR Dataset from various news videos available in Youtube. Collected from several reliable channels for Indonesian Audience, this dataset consists of Indonesian Videos and also filtered using "langdetect" to ensure its language correctness.
License Unknown