IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Closes #246 | Implement dataloader for Librivox-Indonesia #267

Closed jensan-1 closed 1 year ago

jensan-1 commented 1 year ago

Closes #246

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

jensan-1 commented 1 year ago

/test dataset=librivox

github-actions[bot] commented 1 year ago

Run result

Check test log here: https://github.com/IndoNLP/nusa-crowd/actions/runs/3068327121

cahya-wirawan commented 1 year ago

It looks good. It needs some clean up of the doc and TODO string. It would be also nice to add the streaming functionality and to keep the dataset name "Librivox-Indonesia" as it is in the source dataset. Thanks.

jensan-1 commented 1 year ago

Hello @cahya-wirawan, thank you for the reviews and comments. I have updated the implementations accordingly.