Dataset Description:IndQNER is a dataset created by manually annotating 8 chapters in the Indonesian translation of the Quran text. The annotation was performed using BIO (Beginning-Inside-Outside) tagging format.
Domain:Religion (Indonesian translation of the Quran)
Dataset Size:The dataset contains 3118 sentences and 2476 named entities from 18 categories. The split: train (2494 sentences), dev (312 sentences), and test (312 sentences).
Is Synthetic:No.
License:Public
Motivation:The dataset was carefully created because it was from a holy book text. The text is the most recent version of the Indonesian translation of the Quran published by the Indonesian Ministry of Religion Affairs. The definition of named entity categories involved trustworthy parties such as existing Quran concepts ontology and Qur'an tafseer experts. Eight annotators contributed to the annotation.
Adding a Dataset