SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for Filipino Slang Spelling Normalization #15

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

Dataloader name: filipino_slang_norm/filipino_slang_norm.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?filipino_slang_norm

Dataset filipino_slang_norm
Description This dataset contains 398 abbreviated and/or contracted Filipino words used in Facebook comments made on weather advisories from a Philippine weather bureau. Each word contains three "correct" versions provided by three undergraduate volunteers.
Subsets -
Languages fil
Tasks Lexical Normalization
License Unknown (unknown)
Homepage https://github.com/ljyflores/efficient-spelling-normalization-filipino
HF URL -
Paper URL https://aclanthology.org/2022.sustainlp-1.5/
ljvmiranda921 commented 1 year ago

Also interested in this!

ljvmiranda921 commented 1 year ago

self-assign