Create dataset loader for ICON Indonesian Constituency Treebank

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?icon

Dataset	icon
Description	In this work, we publish ICON (Indonesian CONstituency treebank), a manually-annotated benchmark Indonesian constituency treebank with a size of 10,000 sentences and approximately 124,000 constituents and 182,000 tokens, which can support the training of state-of-the-art transformer-based models. We use 15 phrase level tags and 24 POS tags. The sentences were taken from Wikipedia (3000) and news articles (7000).
License	CC-BY-SA 4.0

Dataset

icon

Description

In this work, we publish ICON (Indonesian CONstituency treebank), a manually-annotated benchmark Indonesian constituency treebank with a size of 10,000 sentences and approximately 124,000 constituents and 182,000 tokens, which can support the training of state-of-the-art transformer-based models. We use 15 phrase level tags and 24 POS tags. The sentences were taken from Wikipedia (3000) and news articles (7000).

License

CC-BY-SA 4.0

IndoNLP / nusa-crowd

Create dataset loader for ICON Indonesian Constituency Treebank #368