SamuelCahyawijaya commented 7 months ago

Dataloader name: tha_lotus/tha_lotus.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?tha_lotus

Dataset	tha_lotus
Description	The Large vOcabualry Thai continUous Speech recognition (LOTUS) corpus was designed for developing large vocabulary continuous speech recognition (LVCSR), spoken dialogue system, speech dictation, broadcast news transcriber. It contains two datasets, one for training acoustic model, another for training a language model.
Subsets	-
Languages	tha
Tasks	Automatic Speech Recognition
License	Creative Commons Attribution Non Commercial Share Alike 3.0 (cc-by-nc-sa-3.0)
Homepage	https://github.com/korakot/corpus/releases/download/v1.0/AIFORTHAI-LotusCorpus.zip
HF URL	-
Paper URL	https://doi.org/10.1109/ICSDA.2009.5278377

djanibekov commented 7 months ago

self-assign

bp-high commented 7 months ago

self-assign

sabilmakbar commented 6 months ago

self-assign

sabilmakbar commented 6 months ago

Hi @holylovenia, can it be used as its homepage?

https://github.com/korakot/corpus/tree/main/LOTUS

Or shall we add the homepage of AI for Thai instead?

holylovenia commented 6 months ago

https://github.com/korakot/corpus/tree/main/LOTUS

@sabilmakbar I think this should be fine as our focus is on the dataset rather than the organization.

sabilmakbar commented 5 months ago

Hi sorry, I'm still implementing this, needs a bit more time as the progress of this dataset is ~70% (also found some additional complexity to the dataloader)

SEACrowd / seacrowd-datahub

Create dataset loader for AIFORTHAI - LotusCorpus #449

self-assign

self-assign

self-assign