IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Closes #278 | Create dataset loader for INDspeech_DIGIT_CDSR #296

Closed IvanHalimP closed 1 year ago

IvanHalimP commented 1 year ago

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset. Closes #278

Checkbox

There are several things I found when working with this:

  1. The test set is not unique; it has 23 duplicate entries from all 4 test sets.
  2. The start '|S|' and end '|E|' token is not removed. Tell me if the removal is necessary.

That's all

IvanHalimP commented 1 year ago

/test dataset=indspeech_digit_cdsr

github-actions[bot] commented 1 year ago

Run result

Check test log here: https://github.com/IndoNLP/nusa-crowd/actions/runs/3137018143

IvanHalimP commented 1 year ago

/test dataset=indspeech_digit_cdsr

github-actions[bot] commented 1 year ago

Run result

Check test log here: https://github.com/IndoNLP/nusa-crowd/actions/runs/3163684591