SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Saltik #313

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 9 months ago

Dataloader name: saltik/saltik.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?saltik

Dataset saltik
Description Saltik is a dataset for benchmarking non-word error correction method accuracy in evaluating Indonesian words. It consists of 58,532 non-word errors generated from 3,000 of the most popular Indonesian words.
Subsets -
Languages ind
Tasks Error Spelling Correction
License GNU Affero General Public License v3.0 (agpl-3.0)
Homepage https://github.com/ir-nlp-csui/saltik
HF URL -
Paper URL https://github.com/ir-nlp-csui/saltik/blob/main/README.md
TysonYu commented 9 months ago

self-assign

github-actions[bot] commented 8 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.