SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Indonesia BioNER #428

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: bioner_id/bioner_id.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?bioner_id

Dataset bioner_id
Description This dataset taken from online health consultation platform Alodokter.com which has been annotated by two medical doctors. Data were annotated using IOB in CoNLL format. Dataset contains 2600 medical answers by doctors from 2017-2020. Two medical experts were assigned to annotate the data into two entity types: DISORDERS and ANATOMY. The topics of answers are: diarrhea, HIV-AIDS, nephrolithiasis and TBC, which marked as high-risk dataset from WHO.
Subsets -
Languages ind
Tasks Named Entiy Recognition
License BSD 3-clause Clear license (bsd-3-clause-clear)
Homepage https://huggingface.co/datasets/abid/indonesia-bioner-dataset
HF URL https://huggingface.co/datasets/abid/indonesia-bioner-dataset
Paper URL https://jtiik.ub.ac.id/index.php/jtiik/article/view/6337/pdf
fhudi commented 8 months ago

self-assign

github-actions[bot] commented 7 months ago

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.