New labeling regimes for ACTER datasets.

AylaRT / ACTER

ACTER is a manually annotated dataset for term extraction, covering 3 languages (English, French, and Dutch), and 4 domains (corruption, dressage, heart failure, and wind energy).

19 stars 2 forks source link

Hi @AylaRT, Thanks for the contribution of ACTER corpora, which is very meaningful for term extraction.

While working on the datasets, we discovered that the current token classifiers with the BIO annotation regime do perform not so well on nested terms. Thus, we would like to propose a new annotation regime where we also annotate single-word nested terms.

Please take a look at the new annotation, which can be seen via this link: https://github.com/honghanhh/nobi_annotation_regime

It would be nice if we could integrate our proposals as the next version of the corpora. Please let us know if you need any further information in advance.

Thanks a lot. Kind regards, Hanh

AylaRT / ACTER

New labeling regimes for ACTER datasets. #3