SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Create dataset loader for PFSA-ID #523

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 5 months ago

Dataloader name: pfsa_id/pfsa_id.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?pfsa_id

Dataset pfsa_id
Description PFSA-ID is an annotated corpus for Public Figure Statement Attribution in the Indonesian Language. The annotation using the multi-class named entity recognition with 11 labels: PERSON, ROLE, AFFILIATION, PERSONCOREF, CUE, CUECOREF, STATEMENT, ISSUE, EVENT, DATETIME, and LOCATION and using the BILOU scheme as the representation of tokens.
Subsets pfsa_id, pfsa_id_med, pfsa_id_test
Languages ind
Tasks Named Entity Recognition, Statement Tagging
License Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage https://github.com/sigit-purnomo/pfsa-id-dl
HF URL -
Paper URL https://doi.org/10.1108/GKMC-04-2022-0091, https://doi.org/10.1016/j.knosys.2024.111558
patrickamadeus commented 5 months ago

self-assign