SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Create dataset loader for CebuaNER #23

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: cebuaner/cebuaner.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?cebuaner

Dataset cebuaner
Description The CebuaNER dataset contains 4000+ news articles that have been tagged by native speakers of Cebuano using the BIO encoding schema for the named entity recognition (NER) task.
Subsets -
Languages ceb
Tasks Named Entiy Recognition
License Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage https://github.com/mebzmoren/CebuaNER
HF URL -
Paper URL https://arxiv.org/abs/2310.00679v1
ljvmiranda921 commented 10 months ago

I can help with this :) Personally motivated to implement Filipino datasets :+1:

ljvmiranda921 commented 10 months ago

self-assign