SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #182. | Implement dataloader for `roots_vi_ted` #329

Closed chenxwh closed 8 months ago

chenxwh commented 9 months ago

Closes #182.

Checkbox

ryanignatius commented 8 months ago

hi @chenxwh thank you for the PR

about the data, since it is publicly accessible after user has accepted the dataset's acknowledgement, could we append "Before using this dataloader, please accept the acknowledgement at https://huggingface.co/datasets/bigscience-data/roots_vi_ted_talks_iwslt and use huggingface-cli login for authentication." to the _DESCRIPTION?

also remove train-00000-of-00001.parquet file from data directory and change _LOCAL to false

chenxwh commented 8 months ago

Thank you! Updated accordingly

holylovenia commented 8 months ago

Replacing @gentaiscool with @MJonibek due to no response.

chenxwh commented 8 months ago

thanks @MJonibek just updated