SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for IndoNER-Tourism #348

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: indoner_tourism/indoner_tourism.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indoner_tourism

Dataset indoner_tourism
Description This dataset is designed for named entity recognition (NER) tasks in the Bahasa Indonesia tourism domain. It contains labeled sequences of named entities, including locations, facilities, and tourism-related entities. The dataset is annotated with the following entity types: - B-WIS: Beginning of a tourism-related entity. - I-WIS: Continuation of a tourism-related entity. - B-LOC: Beginning of a location entity. - I-LOC: Continuation of a location entity. - B-FAS: Beginning of a facility entity. - I-FAS: Continuation of a facility entity. - O: Non-entity or other words not falling into the specified categories.
Subsets -
Languages ind
Tasks Named Entiy Recognition
License Academic Free License v3.0 (afl-3.0)
Homepage https://github.com/fathanick/IndoNER-Tourism/tree/main
HF URL -
Paper URL https://www.inacl.id/journal/index.php/jlk/article/download/89/63
luckysusanto commented 8 months ago

self-assign