allenai / allennlp-gallery

A gallery of projects built with AllenNLP
https://gallery.allennlp.org
Apache License 2.0
4 stars 0 forks source link

New project: DaN+ #30

Closed robvanderg closed 3 years ago

robvanderg commented 3 years ago

Project metadata:

{
  "title": "DaN+: Danish Nested Named Entities and Lexical Normalization",
  "authors": [
    {
      "name": "Barbara Plank",
      "email": "bapl@itu.dk",
      "affiliation": "IT University of Copenhagen",
    }, {
      "name": "Kristian Nørgaard Jensen",
      "email": "krnj@itu.dk",
      "affiliation": "IT University of Copenhagen"
    }, {
      "name": "Rob van der Goot",
      "email": "robv@itu.dk",
      "affiliation": "IT University of Copenhagen"
    }  ],
  "submission_date": "01-04-2021",
  "github_link": "https://github.com/bplank/DaNplus",
  "paper_link": "https://www.aclweb.org/anthology/2020.coling-main.583.pdf",
  "allennlp_version": "1.1",
  "datasets": [
    {
      "name": "DaN+",
      "link": "https://github.com/bplank/DaNplus"
    }
  ],
  "tags": ["named entity recognition", "named entity detection", "lexical normalization", "domain adaptation", "Danish"]
}

Description:

This paper introduces DAN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization to support research on cross-lingual cross-domain learning for a less-resourced language. We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task. We compare transfer capabilities from German versus in-language annotation from scratch. We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER. Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data. Our results also show that an out-of-domain setup remains challenging, while performance on news plateaus quickly. This highlights the importance of cross-domain evaluation of cross-lingual transfer.