SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Create dataset loader for TSynC2 Corpus #585

Open SamuelCahyawijaya opened 3 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: tsync2/tsync2.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?tsync2

Dataset tsync2
Description TSynC2 is a Thai text-to-speech (TTS) dataset from NECTEC. An earlier smaller version called TSynC1 is also available. The dataset is also available for download from the AI for Thai platform.
Subsets -
Languages tha
Tasks Text-To-Speech Synthesis
License Creative Commons Attribution Non Commercial Share Alike 3.0 (cc-by-nc-sa-3.0)
Homepage https://github.com/korakot/corpus/releases
HF URL -
Paper URL https://lexitron.nectec.or.th/KM_HL5001/file_HL5001/Document/krrn_14518.pdf
richardy-lobo-sapan commented 2 months ago

self-assign