SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Create dataset loader for BKD-Prosub #589

Open SamuelCahyawijaya opened 6 months ago

SamuelCahyawijaya commented 6 months ago

Dataloader name: bkd_prosub/bkd_prosub.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?bkd_prosub

Dataset bkd_prosub
Description BKD-Prosub is a Thai Pronoun Substitute and Address Term Annotated corpus in the work of BKD (Bangkok Data) corpus collection. The sentences are extracted from 2 TV drama scripts (Nakii, Dare to love) and 3 novels (Teepaankoon, Namsaycaycin, Phaathoog). Character position of the target word in a sentence is labeled with the types of Prosub or Address-term.
Subsets -
Languages tha
Tasks Coreference Resolution
License Apache license 2.0 (apache-2.0)
Homepage https://babyai-hub.vizdata.tech/?p=313
HF URL -
Paper URL https://www.virach.com/_files/ugd/cdb1d4_b0f1d95aa023454082c081062052cf51.pdf